The document discusses the challenges of extracting insight from big data and earning trust when using data. It examines fears about data leading to a trivial culture drowned in irrelevance (Huxley) or loss of privacy and control due to surveillance (Orwell). While size is important, the bigger challenge with big data is the variety of structured and unstructured data from diverse sources. Extracting meaning from human language in data requires rigorous analytical approaches. Case studies show traditional methodologies must adapt to address issues like multiple languages, spam filtering, relevance categorization, and consistency. Ensuring accurate, transparent analysis is key to gaining insight from data while protecting privacy and trust.
Simpler, Clearer, Faster Government ServicesThoughtworks
Paul Shetler is the CEO of the Digital Transformation Office within the Australian Government, and was previously an executive at the UK's highly regarded Government Digital Service.
At ThoughtWorks Live Australia 2016, he shared how he is leading the transformation to simpler, clearer, and faster government services using a user-centred design approach.
The 100 Leading Global Fintech Innovators 2015 H2 Ventures
We are pleased to present the second annual ‘Fintech 100’, the best fintech innovators, this year from 19 countries around the world.
The Fintech 100 are those companies using technology to the best advantage and driving disruption within the financial services industry. These companies have a commitment to excellence, superior customer experience and a demonstrated ability to do one thing in a market better than everyone else.
The Fintech 100 includes the leading 50 fintech companies across the globe, and the most intriguing 50 ‘emerging stars’ – exciting new fintechs with bold, disruptive and potentially game-changing ideas – expanding on the success of last year’s list.
Visit www.fintechinnovators.com for more information
Blockchain the inception of a new database of everything by dinis guarda bloc...Dinis Guarda
Blockchain the inception of a new database of everything by Dinis Guarda blockchain age
Trends and questions?
1. Redefinition of banking and relation with Blockchain
Mobile App banking finance – mobile ledgers – blockchain identity
New products and the emergence of DAO products.
2. System Legacies in paralel with advanced tech - Ethereum.
3. Distribution Strategy in a new Digitalised World who own what.
4. Super computer Cloud base blochcain solutions / infrastructure.
5. Emergence of AI IOE in relation with blockchain all connected.
6. User Experience, UI, UE, Big data and the IOE blockchain touching.
7. Blockchain Cyber Security and Value Reinvention.
A somewhat longer version of my Frontiers talk about technology and the future of the economy, with additional material pitched to an audience of Internet operators at Apricot 2017, in Ho Chi Minh City, Vietnam on February 27, 2017
We’re leaking, and everything’s fine: How and why companies deliberately leak...Ian McCarthy
Although the protection of secrets is often vital to the survival of organizations, at other times organizations can benefit by deliberately leaking secrets to outsiders. We explore how and why this is the case. We identify two dimensions of leaks: (1) whether the information in the leak is factual or concocted and (2) whether leaks are conducted overtly or covertly. Using these two dimensions, we identify four types of leaks: informing, dissembling, misdirecting, and provoking. We also provide a framework to help managers decide whether or not they should leak secrets.
Simpler, Clearer, Faster Government ServicesThoughtworks
Paul Shetler is the CEO of the Digital Transformation Office within the Australian Government, and was previously an executive at the UK's highly regarded Government Digital Service.
At ThoughtWorks Live Australia 2016, he shared how he is leading the transformation to simpler, clearer, and faster government services using a user-centred design approach.
The 100 Leading Global Fintech Innovators 2015 H2 Ventures
We are pleased to present the second annual ‘Fintech 100’, the best fintech innovators, this year from 19 countries around the world.
The Fintech 100 are those companies using technology to the best advantage and driving disruption within the financial services industry. These companies have a commitment to excellence, superior customer experience and a demonstrated ability to do one thing in a market better than everyone else.
The Fintech 100 includes the leading 50 fintech companies across the globe, and the most intriguing 50 ‘emerging stars’ – exciting new fintechs with bold, disruptive and potentially game-changing ideas – expanding on the success of last year’s list.
Visit www.fintechinnovators.com for more information
Blockchain the inception of a new database of everything by dinis guarda bloc...Dinis Guarda
Blockchain the inception of a new database of everything by Dinis Guarda blockchain age
Trends and questions?
1. Redefinition of banking and relation with Blockchain
Mobile App banking finance – mobile ledgers – blockchain identity
New products and the emergence of DAO products.
2. System Legacies in paralel with advanced tech - Ethereum.
3. Distribution Strategy in a new Digitalised World who own what.
4. Super computer Cloud base blochcain solutions / infrastructure.
5. Emergence of AI IOE in relation with blockchain all connected.
6. User Experience, UI, UE, Big data and the IOE blockchain touching.
7. Blockchain Cyber Security and Value Reinvention.
A somewhat longer version of my Frontiers talk about technology and the future of the economy, with additional material pitched to an audience of Internet operators at Apricot 2017, in Ho Chi Minh City, Vietnam on February 27, 2017
We’re leaking, and everything’s fine: How and why companies deliberately leak...Ian McCarthy
Although the protection of secrets is often vital to the survival of organizations, at other times organizations can benefit by deliberately leaking secrets to outsiders. We explore how and why this is the case. We identify two dimensions of leaks: (1) whether the information in the leak is factual or concocted and (2) whether leaks are conducted overtly or covertly. Using these two dimensions, we identify four types of leaks: informing, dissembling, misdirecting, and provoking. We also provide a framework to help managers decide whether or not they should leak secrets.
Social Data Intelligence: Integrating Social and Enterprise Data for Competit...Susan Etlinger
This report lays out a mandate for enterprise organizations to integrate social data into other enterprise data streams, or risk building a "social silo." Includes best practices, frameworks, and a social data maturity map.
Skillsoft Strategy: Harnessing the Power of Big DataSkillsoft
John Ambrose, SVP, Strategy Corporate Development and Emerging Business at Skillsoft, explores why big data is one of the hottest buzzwords in technology. Big data is already changing industries from retail to healthcare to transportation and more.
How can the learning industry benefit from big data? Skillsoft is undertaking groundbreaking research in collaboration with IBM, the biggest name in big data. John Ambrose shared some of the early findings of a multi-phase joint development agreement between Skillsoft and IBM Research to leverage the learning interactions of millions of learners to create more personalized, adaptive enterprise learning experience – in order to predict what content and topics learners will need based on a variety of factors including job role, company, and even day of the week.
In addition to sharing Skillsoft's efforts to harness the power of big data to transform enterprise learning, John shared other new developments and areas of strategic focus that Skillsoft is working on to bring the latest in learning innovation to our customers.
By 2020 more than 7 billion people will be communicating and performing transactions over the web on over 35 billion devices. So how can companies effectively create a digital identity that promises security, ease and comfort for its customers? This study, sponsored by Oracle, assesses the role identity plays in the digital economy. Visit hub: http://bit.ly/1LKqXfN
The biggest disruption of the digital age is the need to extract insight from data in a way that engenders trust. To make the best use of data, executives need to educate themselves — and use this insight to plan their data strategy now.
This document proposes a framework to better understand and address: 1) How we extract insight from data, and 2) How we use data in such a way as to earn and protect trust: the trust of customers, constituents, patients, and partners
Download the full report at: http://pages.altimetergroup.com/what-do-we-do-with-all-this-big-data-report.html
Self-employed, "1099" workers represent the new face of America's economy. Here, Core Innovation Capital examines this fundamental shift in the nature of work, the ramifications that 1099 status has on Americans' financial lives, and the technology companies that are rising to address novel financial pain points.
Slides from my keynote w/ Capgemini in Copenhagen - looking at how Microsoft, GE, KLM and Uber use Salesforce Marketing Cloud to innovate, disrupt and build customer relationships faster.
SlideShare now has a player specifically designed for infographics. Upload your infographics now and see them take off! Need advice on creating infographics? This presentation includes tips for producing stand-out infographics. Read more about the new SlideShare infographics player here: http://wp.me/p24NNG-2ay
This infographic was designed by Column Five: http://columnfivemedia.com/
Are Millennials as reluctant to work for the government as the conventional wisdom suggests? A deeper dive into survey data indicates a more complex story—and steps that public agencies should consider to attract and retain younger workers. Learn more about Millennials in government in our latest report: http://deloi.tt/1PC6fWr
Consumers rely on businesses to keep their personal information safe. Too few of those businesses are actively protecting that data. Here’s what’s gone wrong, and how businesses should be responding. Full blog here: http://bit.ly/1Jtzym5
Strategies for the Age of Digital Disruption #DTR7Capgemini
Since 2000, 52% of companies in the Fortune 500 have either gone bankrupt, been acquired or ceased to exist. These are challenging times for companies as the speed, volume and complexity of change intensify. Disruption can happen at any time, in any sector, and its effect on traditional organizations can be fundamental. This is why we chose to dedicate our seventh edition of the Digital Transformation Review to digital disruptions. How can organizations survive and thrive in the age of digital disruptions? We posed this very question to a panel of industry leaders, academics, startup founders, analysts and technology gurus from three different continents.
Working with our global panel, we have built a detailed picture of the digital disruption phenomenon, probing the key questions that organizations need answers to:
• How can we plan for the emergence of disruptors?
• Why are we seeing so many disruptions?
• How can organizations respond to disruption?
• What shape are these disruptions taking?
• Which startups are likely to emerge to disrupt sector value chains over the coming years?
We hope this edition of the Digital Transformation Review has helped increase understanding of the disruptive and challenging times we live in. Join the conversation on twitter #DTR7
This is First Round's effort to provide an in-depth snapshot of what it's like to run a technology startup in 2018. We surveyed over 520 venture-backed founders who volunteered their experience and opinions.
Keeping it real - How authentic is your Corporate Purpose? Burson-Marsteller
Burson-Marsteller and Swiss-based IMD have been working together to research corporate purpose since 2008. This year’s study is presented in the context of the findings of Burson-Marsteller’s Corporate Perception Indicator, a global survey of public hopes and expectations of companies and their leaders.
Collaborative Storytelling: Presentation at Startupfest 2013Susan Etlinger
Presented at Startupfest 2013
Nearly 100 years ago, the French Surrealists invented a game they called “Le Cadavre Exquis” (“Exquisite Corpse”), in which they would collectively create a story using words and images. Today, customers, partners, friends, competitors and others collectively write and share the stories of organizations with which they interact. In this session, industry analyst Susan Etlinger will share examples of how both startups and established brands use social data to create a most holistic picture of their customer’s needs, wants and aspirations, and how startup entrepreneurs can use this data to build their brands and develop mutually valuable and sustainable relationships.
Watch the webinar replay at: http://www.slideshare.net/Altimeter/recording-four-steps-brands-can-take-to-design-internet-of-things-experiences
Download the report at: http://www.altimetergroup.com/2015/03/new-research-customer-experience-in-the-internet-of-things/
The challenge for many brands is making sense of IoT — what it is, what it isn't — and how, where, and when to apply IoT to consumer-facing programs. In this 1-hour webinar, Jessica Groopman and Charlene Li will share practical ways brands can evaluate the opportunity and how to get started.
Social Data Intelligence: Integrating Social and Enterprise Data for Competit...Susan Etlinger
This report lays out a mandate for enterprise organizations to integrate social data into other enterprise data streams, or risk building a "social silo." Includes best practices, frameworks, and a social data maturity map.
Skillsoft Strategy: Harnessing the Power of Big DataSkillsoft
John Ambrose, SVP, Strategy Corporate Development and Emerging Business at Skillsoft, explores why big data is one of the hottest buzzwords in technology. Big data is already changing industries from retail to healthcare to transportation and more.
How can the learning industry benefit from big data? Skillsoft is undertaking groundbreaking research in collaboration with IBM, the biggest name in big data. John Ambrose shared some of the early findings of a multi-phase joint development agreement between Skillsoft and IBM Research to leverage the learning interactions of millions of learners to create more personalized, adaptive enterprise learning experience – in order to predict what content and topics learners will need based on a variety of factors including job role, company, and even day of the week.
In addition to sharing Skillsoft's efforts to harness the power of big data to transform enterprise learning, John shared other new developments and areas of strategic focus that Skillsoft is working on to bring the latest in learning innovation to our customers.
By 2020 more than 7 billion people will be communicating and performing transactions over the web on over 35 billion devices. So how can companies effectively create a digital identity that promises security, ease and comfort for its customers? This study, sponsored by Oracle, assesses the role identity plays in the digital economy. Visit hub: http://bit.ly/1LKqXfN
The biggest disruption of the digital age is the need to extract insight from data in a way that engenders trust. To make the best use of data, executives need to educate themselves — and use this insight to plan their data strategy now.
This document proposes a framework to better understand and address: 1) How we extract insight from data, and 2) How we use data in such a way as to earn and protect trust: the trust of customers, constituents, patients, and partners
Download the full report at: http://pages.altimetergroup.com/what-do-we-do-with-all-this-big-data-report.html
Self-employed, "1099" workers represent the new face of America's economy. Here, Core Innovation Capital examines this fundamental shift in the nature of work, the ramifications that 1099 status has on Americans' financial lives, and the technology companies that are rising to address novel financial pain points.
Slides from my keynote w/ Capgemini in Copenhagen - looking at how Microsoft, GE, KLM and Uber use Salesforce Marketing Cloud to innovate, disrupt and build customer relationships faster.
SlideShare now has a player specifically designed for infographics. Upload your infographics now and see them take off! Need advice on creating infographics? This presentation includes tips for producing stand-out infographics. Read more about the new SlideShare infographics player here: http://wp.me/p24NNG-2ay
This infographic was designed by Column Five: http://columnfivemedia.com/
Are Millennials as reluctant to work for the government as the conventional wisdom suggests? A deeper dive into survey data indicates a more complex story—and steps that public agencies should consider to attract and retain younger workers. Learn more about Millennials in government in our latest report: http://deloi.tt/1PC6fWr
Consumers rely on businesses to keep their personal information safe. Too few of those businesses are actively protecting that data. Here’s what’s gone wrong, and how businesses should be responding. Full blog here: http://bit.ly/1Jtzym5
Strategies for the Age of Digital Disruption #DTR7Capgemini
Since 2000, 52% of companies in the Fortune 500 have either gone bankrupt, been acquired or ceased to exist. These are challenging times for companies as the speed, volume and complexity of change intensify. Disruption can happen at any time, in any sector, and its effect on traditional organizations can be fundamental. This is why we chose to dedicate our seventh edition of the Digital Transformation Review to digital disruptions. How can organizations survive and thrive in the age of digital disruptions? We posed this very question to a panel of industry leaders, academics, startup founders, analysts and technology gurus from three different continents.
Working with our global panel, we have built a detailed picture of the digital disruption phenomenon, probing the key questions that organizations need answers to:
• How can we plan for the emergence of disruptors?
• Why are we seeing so many disruptions?
• How can organizations respond to disruption?
• What shape are these disruptions taking?
• Which startups are likely to emerge to disrupt sector value chains over the coming years?
We hope this edition of the Digital Transformation Review has helped increase understanding of the disruptive and challenging times we live in. Join the conversation on twitter #DTR7
This is First Round's effort to provide an in-depth snapshot of what it's like to run a technology startup in 2018. We surveyed over 520 venture-backed founders who volunteered their experience and opinions.
Keeping it real - How authentic is your Corporate Purpose? Burson-Marsteller
Burson-Marsteller and Swiss-based IMD have been working together to research corporate purpose since 2008. This year’s study is presented in the context of the findings of Burson-Marsteller’s Corporate Perception Indicator, a global survey of public hopes and expectations of companies and their leaders.
Collaborative Storytelling: Presentation at Startupfest 2013Susan Etlinger
Presented at Startupfest 2013
Nearly 100 years ago, the French Surrealists invented a game they called “Le Cadavre Exquis” (“Exquisite Corpse”), in which they would collectively create a story using words and images. Today, customers, partners, friends, competitors and others collectively write and share the stories of organizations with which they interact. In this session, industry analyst Susan Etlinger will share examples of how both startups and established brands use social data to create a most holistic picture of their customer’s needs, wants and aspirations, and how startup entrepreneurs can use this data to build their brands and develop mutually valuable and sustainable relationships.
Watch the webinar replay at: http://www.slideshare.net/Altimeter/recording-four-steps-brands-can-take-to-design-internet-of-things-experiences
Download the report at: http://www.altimetergroup.com/2015/03/new-research-customer-experience-in-the-internet-of-things/
The challenge for many brands is making sense of IoT — what it is, what it isn't — and how, where, and when to apply IoT to consumer-facing programs. In this 1-hour webinar, Jessica Groopman and Charlene Li will share practical ways brands can evaluate the opportunity and how to get started.
The Future Of Business by Altimeter GroupCharlene Li
What will the future of business be? Altimeter Group provides its takes on the ways emerging technologies challenge business, and what they must do to address them from four perspectives.
Leading Digital Transformation: Putting People FirstCharlene Li
Slides for speech at HR Tech Expo by NCHRA on August 25, 2017. Based on research by Prophet "HR as a Force for Digital Change" available at https://goo.gl/qu7rN3.
Description: Transformations are never easy, and the digital transformation is doubly so because of the technology angle. HR leaders must work the fine line between pushing executives and teams to be agile and change faster, while still enabling the organization to deliver on near-term objectives.
We'll examine the challenges and opportunities that digital creates, and the crucial role that HR leaders play in bringing about the transformation needed to help your organization thrive in the digital era.
Why a content marketing software request for proposal (RFP)?
Against a backdrop of hundreds of content marketing vendors, with new ones emerging all the time, brands are challenged to articulate their content marketing needs. This makes creating an RFP and asking the right questions both of internal stakeholders as well as vendors incredibly difficult from the start.
In this 45-minute webinar, analyst Rebecca Lieb shares best practices for your content marketing software selection process, including her research on the content marketing software landscape.
Watch the webinar on-demand at: http://www.slideshare.net/Altimeter/webinar-content-marketing-software-rfp-by-altimeter-group
Download Altimeter's Content Marketing Software RFP template at: www.altimetergroup.com/content-marketing-software-rfp.
What's the dollar value of a well-timed tweet? How do you turn a blog post into a revenue stream? Did a "like" really increase the value of my brand?
In this 1-hour webinar, industry analysts Susan Etlinger and Rebecca Lieb share their latest research on Content Marketing Performance: A Framework to Measure Real Business Impact. Using a pragmatic measurement framework, they’ll share how you can measure your content marketing and content strategy efforts, including six value propositions and sample metrics for each.
Download the full report at: http://pages.altimetergroup.com/content-marketing-performance-report.html
Watch the webinar at: http://www.slideshare.net/Altimeter/webinar-content-marketing-metrics-altimteter-group
This is a presentation that's part of a series in which LinkedIn Influencers analyze the state and future of their industry. You can read the posts at https://www.linkedin.com/channels/the_economy?trk=prod-inf-myindustry-0325-cutline
Social Data Intelligence: Webinar with Susan EtlingerSusan Etlinger
This webinar covers the findings from the Altimeter Group report, Social Data Intelligence, which lays out the imperative for organizations to integrate social data with other data streams in the enterprise. Includes best practices and frameworks, as well as a maturity map to enable organizations to make the best and most strategic use of social data.
Slides for Altimeter's webinar: The Inevitability of a Mobile-Only Customer Experience
Watch the webinar replay at: http://www.slideshare.net/Altimeter/webinar-mobile-only-customer-experience-altimeter-group
Download the report at: http://pages.altimetergroup.com/mobile-only-customer-experience-report.html
Description: Customers are becoming increasingly mobile, and, as a result, the customer journey is in need of an overhaul. In this 1-hour webinar, join Jaimy Szymanski and Brian Solis for a discussion on how organizations can approach mobile design strategy through the lens of an evolving connected customer.
"The Engaged Leader" at SXSW InteractiveCharlene Li
Presentation by Charlene Li at SXSW Interactive, Austin, TX on Saturday, March 14, 2015 (Pi Day)
Title: Creating A Digital Engagement Strategy for Leaders
Description: Digital and social technologies have revolutionized relationships – and leadership is not immune. Despite the pressure to engage, leaders remain on the sidelines, paralyzed by fear and the unknown. We’ll look at how leaders can use technology to listen, share, and engage with employees and customers, at scale. We’ll also discuss common objections and concerns of leaders – and how to address them.
Slides for Altimeter's webinar: A Culture of Content. Watch the webinar replay at: http://www.slideshare.net/Altimeter/webinar-a-culture-of-content-by-altimeter-group.
Description:
When content becomes an ingrained part of an organization's culture, content strategy functions like a well-oiled engine, producing, circulating, and begetting content, creating numerous efficiencies in the process.
In this 1-hour webinar, learn how companies evangelize, reinforce, and institutionalize the importance of content throughout and beyond the marketing organization. Rebecca Lieb and Jessica Groopman share findings and recommendations from their report, Culture of Content.
Download the report at: pages.altimetergroup.com/culture-of-content-report.html
With every consumer expected to own up to 20 or more connected devices by the year 2020 (Source: Intel), the Internet of Things (IoT) is a channel for engagement that brands can't afford to ignore. Yet many companies are mystified by IoT, and how it fits in with their digital strategy. Watch this 1-hour webinar with Jessica Groopman and Charlene Li to learn about Altimeter's latest research on how brands can enhance the customer experience through IoT.
Watch the webinar replay at: https://www.slideshare.net/Altimeter/webinar-customer-experience-in-the-internet-of-things
Download the report at: http://pages.altimetergroup.com/customer-experience-in-the-internet-of-things-report.html
McKinsey partner Jason Heller provides an overview of the key technologies that will impact the business landscape: artificial intelligence, automation and impact on the future workforce, virtual reality, augmented reality, the Internet of things, and data security.
McKinsey's Jennifer Stanley goes beyond the latest research about when to use digital and when not to. Digital might be the answer, but what is the question? Clearly digital is a game changer for sales organizations that do it well and are in the lead. B2B players that embed digital in their go-to market programs grow >5x faster than their peers and have 30% higher acquisition efficiency.
Top Digital Transformation Trends and Priorities for 2016Charlene Li
Given the importance of digital transformation and the never ending onslaught of new technologies, how should organization prioritize limited resources, time, and attention? This presentation to the San Francisco American Marketing Association is the 7th year in a row when Charlene has presented her take of top digital trends.
Notes from the Observation Deck // A Data Revolution gngeorge
Notes from the Observation Deck will provide you with an examined look at the interesting phenomena and trends taking place around us today. We present them to you with the hope of sparking broader conversations, debates and ideas. Please use this as a resource for knowledge, inspiration and enjoyment.
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong Value-Adding Proposition
by Patrick Hadley, Australian Bureau of Statistics at the Australian CIO Summit 2014
Big data is a phenomenon brought about by rapid data growth, complex, new, and changing data types, and parallel technology advancements; it brings huge possibilities. By optimizing these enormous amounts of structured and unstructured data, CSPs are in a unique position to capture these opportunities and create new revenue streams.
Communications of the Association for Information SystemsV.docxmonicafrancis71118
Communications of the Association for Information Systems
Volume 34 Article 65
5-2014
Tutorial: Big Data Analytics: Concepts,
Technologies, and Applications
Hugh J. Watson
University of Georgia, [email protected]
Follow this and additional works at: http://aisel.aisnet.org/cais
This material is brought to you by the Journals at AIS Electronic Library (AISeL). It has been accepted for inclusion in Communications of the
Association for Information Systems by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact
[email protected]
Recommended Citation
Watson, Hugh J. (2014) "Tutorial: Big Data Analytics: Concepts, Technologies, and Applications," Communications of the Association
for Information Systems: Vol. 34, Article 65.
Available at: http://aisel.aisnet.org/cais/vol34/iss1/65
http://aisel.aisnet.org/cais?utm_source=aisel.aisnet.org%2Fcais%2Fvol34%2Fiss1%2F65&utm_medium=PDF&utm_campaign=PDFCoverPages
http://aisel.aisnet.org/cais/vol34?utm_source=aisel.aisnet.org%2Fcais%2Fvol34%2Fiss1%2F65&utm_medium=PDF&utm_campaign=PDFCoverPages
http://aisel.aisnet.org/cais/vol34/iss1/65?utm_source=aisel.aisnet.org%2Fcais%2Fvol34%2Fiss1%2F65&utm_medium=PDF&utm_campaign=PDFCoverPages
http://aisel.aisnet.org/cais?utm_source=aisel.aisnet.org%2Fcais%2Fvol34%2Fiss1%2F65&utm_medium=PDF&utm_campaign=PDFCoverPages
http://aisel.aisnet.org/cais/vol34/iss1/65?utm_source=aisel.aisnet.org%2Fcais%2Fvol34%2Fiss1%2F65&utm_medium=PDF&utm_campaign=PDFCoverPages
mailto:[email protected]>
Volume 34 Article 65
Tutorial: Big Data Analytics: Concepts, Technologies, and Applications
Hugh J. Watson
Department of MIS, University of Georgia
[email protected]
We have entered the big data era. Organizations are capturing, storing, and analyzing data that has high volume,
velocity, and variety and comes from a variety of new sources, including social media, machines, log files, video,
text, image, RFID, and GPS. These sources have strained the capabilities of traditional relational database
management systems and spawned a host of new technologies, approaches, and platforms. The potential value of
big data analytics is great and is clearly established by a growing number of studies. The keys to success with big
data analytics include a clear business need, strong committed sponsorship, alignment between the business and
IT strategies, a fact-based decision-making culture, a strong data infrastructure, the right analytical tools, and people
skilled in the use of analytics. Because of the paradigm shift in the kinds of data being analyzed and how this data is
used, big data can be considered to be a new, fourth generation of decision support data management. Though the
business value from big data is great, especially for online companies like Google and Facebook, how it is being
used is raising significant privacy concerns.
Keywords: big data, analytics, benefits, architecture, platforms, privacy
Volume 34, .
Big Data for International DevelopmentAlex Rascanu
Alex Rascanu delivered the "Big Data for International Development" presentation at the International Development Conference that took place on February 7, 2015 at University of Toronto Scarborough.
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...Susan Etlinger
This report provides an industry update, best pratices and frameworks for understanding how to approach and build a social media command center that integrates with other digital and enterprise signals in the business.
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...Susan Etlinger
This report provides an industry update on the evolution of the social media command center--from a social media-driven function to a digital intelligence hub for the enterprise.
Canary in the Coalmine: How Social Media Can Prepare Us for Big Data Susan Etlinger
n this talk, Il discuss how organizations are addressing the challenges of social data--technological, organizational and cultural--and what it can teach us on the road to big data.
This is a talk I originally prepared for the Alchemist Series (www.alchemistseries.com) about working with industry analysts. I'd appreciate any other tips and suggestions from analysts, as well as feedback from entrepreneurs. Cheers!
This report is intended primarily for business people who are tasked with understanding,
interpreting, and acting on social data—executives, strategic planners, social strategists,
and marketers. It will outline the key challenges of social data, propose a value-based
framework for social analytics, and recommend clear and pragmatic steps that companies
engaged in social media must follow to ensure they are gaining insights, measuring effectively,
interpreting accurately, and taking appropriate action—both today and in the longer term.
1. What Do We Do with
All This Big Data?
Fostering Insight and Trust in the Digital Age
A Market Definition Report
January 21, 2015
By Susan Etlinger
Edited by Rebecca Lieb
2. Introduction
Every day, we hear new stories about data: how much there is, how fast it
moves, how it’s used for good or ill. Data ubiquity affects our businesses,
our educational and legal systems, our society, and increasingly, our
dinner-table conversation. I had the opportunity to speak at TED@IBM
in San Francisco on September 23, 2014, about the implications of a
data-rich world, and what we can do as businesspeople, citizens, and
consumers, to use it to our best advantage.1
That talk, as well as this document, examines two themes that underlie
many conversations about data and technology that correspond to fears
that George Orwell and Aldous Huxley chronicled in their novels 1984 and
Brave New World. As the culture critic Neil Postman put it in his 1985 book,
Amusing Ourselves to Death:
What Orwell feared were those who would ban books. What
Huxley feared was that there would be no reason to ban a book,
for there would be no one who wanted to read one. Orwell
feared those who would deprive us of information. Huxley
feared those who would give us so much that we would be
reduced to passivity and egotism. Orwell feared that the truth
would be concealed from us. Huxley feared the truth would
be drowned in a sea of irrelevance. Orwell feared we would
become a captive culture. Huxley feared we would become a
trivial culture.2
These two themes—irrelevance and narcissism on one hand (Huxley) and
surveillance and power on the other (Orwell)—anticipate modern fears
about the explosion of data in our personal and professional lives. As
individuals, we crave insight and convenience, yet we simultaneously fear
loss of control over our privacy and our digital identities.
3. Photo: Daniel K. Davis/TED
Susan Etlinger
speaking at TED@IBM at SFJAZZ, San Francisco, California, September 23, 2014.
4. What’s So Hard About Big Data? .......................................................................................................................................
With Big Data, Size Isn’t Everything ...............................................................................................................................
Unstructured Data Demands New Analytical Approaches ........................................................................................
Traditional Methodologies Must Adapt ........................................................................................................................
From Data to Insight ..............................................................................................................................................................................
Big Data Requires Linguistic Expertise .........................................................................................................................
Big Data Requires Expertise in Data Science and Critical Thinking .........................................................................
Legal and Ethical Issues of Big Data .................................................................................................................................
Planning for Data Ubiquity .............................................................................................................................................................
Conclusion .........................................................................................................................................................................
Table of Contents
5
6
8
10
13
14
14
17
21
23
Executive Summary
This document proposes an approach to better understand and address:
• How we extract insight from data
• How we use data in such a way as to earn and protect trust: the trust of customers,
constituents, patients, and partners
To be clear, these twin challenges of insight and trust will occupy data scientists, engineers,
analysts, ethicists, linguists, lawyers, social scientists, journalists, and, of course, the public for
many years to come. To derive insight from data while protecting and sustaining trust with
communities, organizations must think deeply about how they source and analyze it and clarify and
communicate their roles as stewards of increasingly revealing information. This is only a first step,
but it’s a critical one if we are to derive sustainable advantage from data, big and small.
6. WITH BIG DATA, SIZE ISN’T EVERYTHING
The idea of big data isn’t new; it was defined in the late ’90s by analysts at META Group (now
Gartner Group). According to META/Gartner, big data has three main attributes, known as
the Three Vs:
• Volume (the amount of data)
• Velocity (the speed at which the data moves)
• Variety (the many types of data)3
Now nearly two decades old, this construct has become increasingly pertinent. As IBM has famously said,
“90% of all the data in the world was created in the past two years.”4
To understand why this is, we need to
compare the business conditions that existed when big data was originally defined with today’s. In the early
2000s, technologists were grappling with a burgeoning variety of data types, spurred in large part by the rise of
electronic commerce. Today, social media is a major catalyst of data proliferation. Consider that:
• 100 hours of video are uploaded to YouTube every minute.5
• On WordPress alone, users produce about 64.8 million new blog posts and 60.4 million new comments
each month.6
• 500 million tweets are sent per day.7
Much data is unstructured. It is, as Gartner defines it, “content that does not conform to a specific, pre-
defined data model. It tends to be the human-generated and people-oriented content that does not fit neatly
into database tables.”8
As a result, the primary challenge of what we think of as big data isn’t actually the size;
it’s the variety. For this reason, the term “big data” can sometimes be misleading.
If this seems counterintuitive, consider this example: the New York Stock Exchange (NYSE) recorded
approximately 9.3 billion shares traded on December 16, 2014, more than 18 times the average number of
tweets (approximately 500 million) created per day.9
Even though the number of trades is much larger than
the number of tweets (volume) and the speed of the market may change from hour to hour and day to day
(velocity), the basic attributes of a trade—price, trade time, change from previous trade, previous close, price/
earnings ratio, and so on—are the same every time. A trade is a trade. It is homogeneous and predictable from a
data perspective (variety).
In contrast, social data is far more complex and variable. While a tweet contains some structured data
(metadata about the time it was posted, the user who posted it, whether it includes hashtags or media, such as
photography, and other attributes), it can express anything that fits into 140 characters. It is a mix of structured
metadata and unstructured text and images that can be expressed with variable lengths, languages, meanings,
and formats. It can contain a news headline, a haiku, a sales message, or a random thought. For this reason, a
much smaller number of tweets can be far more complex to analyze from a data standpoint. Size isn’t everything.
6
7. The nature of human
language demands
rigorous and repeatable
processes to extract
meaning from it in
a transparent and
defensible way.
8. UNSTRUCTURED
DATA DEMANDS NEW
ANALYTICAL APPROACHES
The human-generated and people-oriented nature of
unstructured data is both an unprecedented asset and a
disruptive force. Data’s value lies in its ability to capture the
desires, hopes, dreams, preferences, buying habits, likes,
and dislikes of everyday people, whether individually or in
aggregate. The disruptive nature of this data stems from
two attributes:
• It’s raw material. It requires processing to translate it
into a format that machines, and therefore people, can
understand and act upon at scale.
• It offers a window into human behavior and attitudes.
When enriched with demographic and location
information, data can introduce an unprecedented
level of insight and, potentially, privacy concerns..
Unstructured data requires a number of processes and
technologies to:
• Identify the appropriate sources
• Crawl and extract it
• Detect and interpret the language being used
• Filter it for spam
• Categorize it for relevance (e.g., “Gap store” versus
“trade gap”)
• Analyze the content for context (sentiment, tone,
intensity, keywords, location, demographic information)
• Classify it so the business can act on it (a customer
service issue, a request for a product enhancement,
a question, etc.)
Each of these steps is rife with nuances that require both
sophisticated technologies and processes to address
(see Figure 1).
The above challenges add up to a host of risks: missed
signals, inaccurate conclusions, bad decisions, high total
cost of data and tool ownership, and an inability to scale,
among others. Even a small misstep, such as a missing
source, a disparity in filtering algorithms, or a lack of
language support, can have a significant detrimental effect
on the trustworthiness of the results.
A recent story in Foreign Policy magazine provides a timely
example. “Why Big Data Missed the Early Warning Signs of
Ebola” highlights the importance of an early media report
published by Xinhua’s French-language newswire covering
a press conference about an outbreak of an unidentified
hemorrhagic fever in the Macenta prefecture in Guinea.10
The Foreign Policy article debunks some of the hyperbole
about the role of big data in identifying Ebola, not because
the technology wasn’t available (it was) or because the
indications weren’t there (they were), but because, as
author Kalev Leetaru writes, “part of the problem is that
the majority of media in Guinea is not published in English,
while most monitoring systems today emphasize English-
language material.”
8
9. 1
2
3
6
7
5
4
ChallengeSteps
Identify Data Sources
Crawl and Extract Data
Detect and Interpret Language
Filter for Spam
Categorize for Relevance
Analyze for Sentiment and
Keywords/Themes
Classify for Action
Not all data sources provide reliable APIs
or consistent access.
Different tools use different crawlers, which
can return different samples.
Different spam filtering algorithms can also
return different samples, accuracy levels.
Sentiment analysis is highly subjective and subject
to interpretation or error. Even with human coding
(which reduces scalability) and machine learning,
no tool is perfect.
Requires both organizational and technology
resource to tag data so that it is appropriately
classified and shared with the right people.
Inconsistent levels of accuracy and
different approaches.
Not all tools support multiple languages,
or support them equally well.
Bonjour!
Hello! Hola!
もしもし!
Hej!
e
eek
e
ve
l
e
e
m
be eoe
w y
k
eaesw
ee
n
q
of
e
a
o
u
ep
eej
geee
o
oty
h
t
af
f
w
FIGURE 1 CHALLENGES OF UNSTRUCTURED DATA
9
10. TRADITIONAL
METHODOLOGIES
MUST ADAPT
Even in the unlikely event that all relevant data is in English
or another single language, there’s no guarantee that it
will be easy to interpret or that the path to doing so will
be clear. For this reason, researchers in both industry
and academia are grappling with the many challenges
that large, unstructured human data poses as a tool
for conducting scientific or business research. The
following provides an example of how one organization is
addressing these significant methodological issues.
Case Study: Health Media Collaboratory
Applying Methodological Rigor to Big Data
The Health Media Collaboratory (HMC) at the University
of Illinois at Chicago’s Institute for Health Research and
Policy is focused on understanding social data, most of
which is unstructured, to “positively impact the health
behavior of individuals and communities,” according
to its website. In the broadest sense, HMC’s mission is
to develop and propagate a new paradigm for health
media research using innovative strategies to apply
methodological rigor to the analysis of big data.11
The focus of a recent project was to look at how people
talk about quitting smoking on Twitter so that HMC and
the Centers for Disease Control and Prevention (CDC)
could learn how they might promote behavior change.
Recently, HMC turned to Twitter to explore two questions
about the impact, if any, of social data on smoking
cessation. The initial research questions were:
• How much electronic-cigarette promotion is there
on Twitter?
• How much organic conversation about electronic
cigarettes exists on Twitter?
In another project, HMC also looked at whether Twitter
could be used as a tool to evaluate the efficacy of health-
oriented media campaigns. In particular, the CDC wanted
to assess the impact of several provocative and graphic
television commercials, one of which featured a woman
with a hole in her throat. The questions HMC sought to
answer were:
• Did the commercials work?
• How can we prove it?
This type of research, as well as the data it presents, is
vastly different from fielding a conventional multiple-
choice survey in which the questions and answers are
predefined and results tabulate the percentage of answers
in each column. HMC instead had to determine, with an
appropriate level of confidence, how people talk about
smoking on Twitter and whether this data could serve as a
useful indicator of public opinion and even of likely behavior.
10
Researchers in
both industry and
academia are
grappling with the
many challenges
that large,
unstructured human
data poses as a
tool for conducting
scientific or
business research.
11. 11
To do this, the team needed to understand how much
of the Twitter conversation about smoking was spam,
how much was off topic (“smoking marijuana,” “smoking
ribs,” “smoking hot women”), and how much was relevant
(“I’ve really got to quit smoking cigarettes”). For the first
project, it also meant understanding how people talk about
electronic cigarettes in particular. Figure 2 is a recreation
of the search string HMC used in its research, illustrating
why this effort isn’t as simple as it might seem.
The methodology that HMC used to collect, clean, and
analyze the Twitter conversation related to smoking
topics closely mirrors the big data challenges outlined
in Figure 1. While it adheres to scientific method, it’s
important to know that this was a methodology that
HMC itself devised to account for the nuances and
challenges of unstructured data.
1. Data collection. Determine the appropriate source
and sample size of the data to be collected.
2. Keyword selection. Generate the most comprehensive
possible list of keywords, encompassing nonstandard
English usages, slang terms, and misspellings.
3. Metadata. Collect metadata related to the
tweets, including:
a. A tweet ID (a unique numerical identifier assigned
to each tweet)
b. The username and biographical profile of the
account used to post the tweet
c. Geolocation (if enabled by the user)
d. Number of followers of the posting account
e. The number of accounts the posting
account follows
f. The posting account’s Klout score
g. Hashtags
h. URL links
i. Media content attached to the tweet.
4. Filtering for engagement. Because engagement with
the campaign was the determining factor for relevance,
the team filtered tweets that described televised
commercials, later de-duplicating them to ensure that
tweets with multiple keywords would not be counted twice.
5. Human coding. Throughout the process, human
coders reviewed the data to assess relevance and code
message content.
Figure 2: How People Talk About E-Cigarettes
Key Words for E-Cigs
E cigarettes blue cigarette e cigarettes njoy cigarette e cigarettes blu cig e cigarettes njoy cig e cigarettes ecig e
cigarettes e cig e cigarettes @blucigs e cigarettes e-cigarette e cigarettes ecigarette ecigarettes from:blucigs e
cigarettes e-cigarette e cigarettes e-cigs e cigarettes ecigarettes e cigarettes e-cigarettes e cigarettes green
smoke e cigarettes south beach smoke e cigarettes cartomizer ecigarette (atomizer OR atomizers)-perfume e
cigarettes ehookah OR e-hookah e cigarettes ejuice OR ejuices OR e-juice OR e-juice ecigarettes eliquid OR
eliquids OR e-liquid OR e-liquids e cigarettes e-smoke OR e-smokes e cigarettes (esmoke OR esmokes)
sample:5 lang: en e cigarettes lavatube OR lavatubes e cigarettes logicecig OR logicecigs e cigarettes
smartsmoker e cigarettes smokestik OR Smokestiks e cigarettes v2 cig OR “v2 cigs” OR v2cig OR v2cigs vaper
or vapers OR vaping e cigarettes zerocig OR Zerocigs e cigarettes cartomizers e cigarettes e-cigarettes
FIGURE 2 HOW PEOPLE TALK ABOUT E-CIGARETTES
Source: University of Illinois at Chicago’s Institute for Health Research and Policy
12. 12
6. Precision and relevance. The team used a
combination of human and machine coding to assess
relevance and eliminate false positives, using three
teams of trained coders and a process to assess
intercoder reliability using a Kappa score, a statistic
“used to assess inter-rater reliability when observing or
otherwise coding qualitative/categorical variables.”12
According to HMC, “the human-coded tweets were then
used to train a naïve Bayes classifier to automatically
classify the larger dataset of Tips engagement tweets
for relevance. Precision was calculated as the percent
of Tips-relevant tweets yielded by the keyword filters.”13
7. Recall. To assess whether the tweet sample was
representative of and could be generalized to all
potentially relevant Twitter content, the team compared
its sample to a larger sample of unretrieved tweets,
again using trained coders and a Kappa score to
assess how well the filtered tweet sample represented
the larger data set.14
8. Content coding. Finally, the team coded the content
to better understand “fear appeals,” that is, whether the
user accepted, rejected, or disregarded the message.
So, did the CDC’s graphic and disturbing anti-smoking ads
and the Twitter conversation surrounding them actually
lead people to quit? HMC didn’t overstate its data; rather, it
concluded that approximately 87% of the tweets about the
TV commercials expressed fear and that the ads had “the
desired result of jolting the audience into a thought process
that might have some impact on future behavior.”15
HMC’s case study illustrates that unstructured data
requires significant adaptations to analytics methodology
to extract meaning. Certainly it would have been a lot
simpler for the CDC to host a focus group or field a survey
to collect impressions about its anti-smoking campaign,
but that data, as comparatively simple as it would have
been to analyze, would lack the spontaneity and rich variety
of expression available on Twitter or other social networks,
had the teams extended the research to other sources.
The nature of human language demands rigorous and
repeatable processes to extract meaning in a transparent
and defensible way. As a result, analytics methodology is
undergoing an explosive period of change.
14. Subject Matter
Expertise
Access to
Tools
Critical Thinking,
Applied Statistics
Inability to
Execute
Incorrect
Conclusions
Insights
Irrelevant
Conclusions
14
BIG DATA REQUIRES
LINGUISTIC EXPERTISE
As counterintuitive as it might seem, an influx of
unstructured data demands not only new and more
sophisticated technologies to process and store it but a
renewed emphasis on the humanistic disciplines as well.
This is because, as Gartner has said, big data “tends to be
the human-generated and people-oriented content” rather
than highly structured data that fits neatly into databases.
Naturally, “human-generated and people-oriented content”
includes language, which is rife with contractions,
sarcasm, slang, and metaphors expressed in multiple
written forms, in hundreds of languages, 24 hours a day,
seven days a week.
Furthermore, language changes constantly, a fact Oxford
Dictionaries marks each November by publishing a word
of the year that encapsulates that year’s zeitgeist. 2014’s
word was “vape,” salient in light of HMC’s research. Five
years ago, “vape” would have been impossible to interpret,
because it—and its cultural context—didn’t exist yet.
A recent article in MIT Technology Review illustrates just
how quickly language and meaning can evolve, both in
obvious and subtle ways.16
Vivek Kulkarni, a PhD student
in the Data Science Lab at Stony Brook University, along
with several of his colleagues, used linguistic mapping
to illustrate the speed at which word meanings change,
gathering inputs from sources such as Google Books,
Amazon, and Twitter.
“Mouse” acquired an entirely new meaning following the
introduction of the computer mouse in the early 1970s, and
“sandy” changed literally overnight with Hurricane Sandy in
2012. Today we see a constant stream of examples both
of redefined words and of new ones (“vaping,” “selfie”) that
require both technological and humanistic expertise to map,
place in context, and understand.
BIG DATA REQUIRES
EXPERTISE IN DATA
SCIENCE AND CRITICAL
THINKING
The speed, size, and variety of data around us—and the
availability of platforms used to visualize and analyze
it—have democratized the function of analytics within
organizations. At the same time, fundamental analytics
education has lagged, creating a situation in which
organizations are at risk of misinterpreting data of all
kinds. Says Philip B. Stark, professor and chair of statistics
at the University of California, Berkeley, “the type of data
(structured, text, etc.) isn’t the point at all. The way of
thinking matters.”17
Stark emphasizes that good data science requires having
subject matter expertise, access to the appropriate
computational tools, and most importantly, critical thinking
and statistics skills. Figure 3 lays out the consequences of
overlooking any of these three foundational elements.
FIGURE 3 FUNDAMENTALS OF DATA SCIENCE
15. 15
1. Irrelevant conclusions. If tools and critical thinking
are present but subject matter expertise is absent, the
organization risks asking the wrong questions, which
can result in irrelevant conclusions and valueless
answers. In addition, the organization will lack the
context necessary to design experiments that will yield
the answers it needs. It will be unable to understand
the intrinsic limitations of the data, says Stark: noise,
sampling issues, response bias, measurement bias,
and so on. This creates a domino effect that can
squander resources and lead to ineffectual—or worse,
harmful—decisions.
2. Inability to execute. If subject matter expertise and
critical thinking are present, but tools are absent, the
organization will be unable to extract insights at scale
and must resort to time-consuming manual methods.
As a result, the organization risks burning out and
eventually losing top analysts, who now must focus on
brute-force methods of processing and analyzing data,
rather than using their skills for more sophisticated and
rewarding applications.
3. Incorrect conclusions. If subject matter expertise
and tools are present, but critical thinking and a
knowledge of applied statistics are absent, the
organization risks drawing the wrong conclusions from
good data, making poor decisions that may ignore
other critical business signals. Like a lack of subject
matter expertise, this can have harmful consequences
to decision making and, therefore, business results.
Given the spread of data throughout organizations and
the impracticality of hiring legions of trained analysts to
keep pace with its growth, the next step is to evolve from
analytics that simply describe a situation to analytics that
predict what may happen next and then to analytics that
prescribe a course of action.18
But even assuming access to the most sophisticated
algorithms that incorporate the most detailed business
knowledge, widespread access to data necessitates
that more people, irrespective of role, grasp the basics of
logic and statistics to understand that data. This doesn’t
mandate universal PhDs in applied statistics, but it does
require an awareness of basic principles of logic.
The good news is that, while the big data industry is still
in its infancy, many of the most valuable tools for analysis
are widely available—and more than two thousand years
old to boot. As early as 350 BCE, Aristotle described 13
logical fallacies, which logicians and philosophers have
built upon during the last 2,400 years.19
Ignoring these
fallacies leaves organizations vulnerable to a host of risks,
which can harm competitive position, financial success,
customer sentiment and trust, and other critical objectives.
One common example is mistaking correlation for
causation, in which organizations erroneously attribute
one outcome (for example, increased revenue) to a
corresponding data point (for example, reach of a
marketing campaign). The increasing use of technologies
that present complex data visually can exacerbate the
problem. Harvard law student Tyler Vigen succinctly (and
sometimes hilariously) presents this phenomenon on his
Spurious Correlations blog.20
The good news is
that, while the big
data industry is
still in its infancy,
many of the most
valuable tools for
analysis are widely
available—and
more than two
thousand years
old to boot.
16. 5.0
DIVORCESPER1000PEOPLE
POUNDS
Divorce rate in Maine
correlates with
Per capita consumption of margarine (US)
Correlation: 0.992558
4.8
4.5
4.4
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
5.0 4.7 4.6 4.4 4.3 4.1 4.2 4.2 4.2 4.1
8.2 7.0 6.5 5.3 5.2 4.0 4.6 4.5 4.2 3.7
4.2
4.0 3
4
5
6
7
8
9
Divorces rate in Maine Per Capita Consumption of Margarine (US)
FIGURE 4 MISTAKING CORRELATION FOR CAUSATION
Source: Tyler Vigen
In Figure 4, Vigen’s calculations show that there is a 99%
correlation between the divorce rate in Maine and per-
capita margarine consumption. Does the Maine divorce
rate somehow cause US residents to eat margarine? Does
US margarine consumption somehow lead to divorce in
Maine? While these questions are absurd, charts such this
visually suggest a link.
The correlation/causation fallacy is just one of many
logical fallacies that have been documented and described
over the years, including formal fallacies (fallacies of
logic) and informal fallacies (fallacies of evidence or
relevance).21
As more tools become available to visualize
data sets quickly and easily, organizations must invest
as much in critical thinking and data science expertise
as they do in tools to visualize data. Otherwise, they risk
succumbing to logical fallacies.
16
18. 18
BIG DATA RAISES
MULTIPLE LEGAL AND
ETHICAL ISSUES
The good news—and the bad news—about big data is that
it can provide unprecedented insight into people, both as
individuals and in aggregate. While surveys can, arguably,
reveal human attitudes, Christian Rudder, CEO of dating
site OKCupid, points out in his 2014 book, Dataclysm:
Who We Are When We Think No One’s Looking, that “we can
pinpoint the speaker, the words, the moment, even the
latitude and longitude of human communication.”22
Many people know the story of how Target discovered
that a young girl was pregnant before her father did; such
stories have become mainstream.23
But much of the
challenge with recent discussions on ethics and privacy
stems from the extremely broad nature of these terms,
the spectrum of personal preferences, and the beliefs of
individuals about the media environment we live in today.
Consider these recent examples:
• Seeking to prevent suicides, Samaritans Radar raises
privacy concerns. In October 2014, the BBC reported
that the Samaritans had launched an app that would
monitor words and phrases such as “hate myself” and
“depressed” on Twitter and would notify users if any
of the people they follow appear to be suicidal.24
While
the app was developed to help people reach out to
those in need, privacy advocates expressed concern
that the information could be used to target and profile
individuals without their consent. According to a
petition filed on Change.org, the Samaritans app was
monitoring approximately 900,000 Twitter accounts
as of late October.25
By November 7, the app was
suspended based on public feedback.26
• Facebook’s “Emotional Contagion” experiment
provokes outrage about its methodology. In June
2014, Facebook’s Adam Kramer published a study in
The Proceedings of the National Academy of Science,
revealing that “emotional states can be transferred
to others via emotional contagion, leading people
to experience the same emotions without their
awareness.”27
In other words, seeing negative stories
on Facebook can make you sad. The experiment
provoked outrage about the perceived lack of informed
consent, the ethical repercussions of such a study,
the concern over appropriate peer review, the privacy
implications, and the precedent such a study might
set for research using digital data.
• Uber knows when and where (and possibly with
whom) you’ve spent the night. In March 2012, Uber
posted, and later deleted, a blog post entitled “Rides of
Glory,” which revealed patterns, by city, of Uber rides
after “brief overnight weekend stays,” also known as
the passenger version of the walk of shame.28
Uber
was later criticized for allegedly revealing its “God
View” at an industry event, showing attendees the
precise location of a particular journalist without his
knowledge, while a December 1, 2014, post on Talking
Points Memo disclosed the story of a job applicant who
was allegedly shown individuals’ live travel information
during an interview.29, 30
Much of the
challenge with
recent discussions
on ethics and
privacy stems
from the extremely
broad nature of
these terms.
19. 1919
• A teenager becomes an Internet celebrity—and a
target—in one day. Alex Lee, a 16-year-old Target clerk,
became a meme (#AlexFromTarget) and a celebrity
within hours, based on a photo taken of him unawares
at work. He was invited to appear on The Ellen Show
and was reported to have received death threats on
social media.31
These stories illustrate several attributes of the data
environment we live in now and the attendant ethical
issues they represent:
• Data collection. The Samaritans example illustrates
the law of unintended consequences: what may
happen when an app collects data that may, albeit
unintentionally, compromise privacy or put people in
harm’s way.
• Methodology and usage. The Facebook example
demonstrates what happens when a company uses
its vast reservoir of data to run technically legal but
ethically ambiguous experiments on its users, raising
questions about the nature of informed consent and
ethical data use in the digital age.
• Aggregation, storage, and stewardship. The Uber posts
illustrate, albeit with aggregated data, the intensely
intimate nature of the data users entrust to companies,
raising questions of stewardship, ethics (is aggregating
such data ethical?), and privacy (what happens if data is
intentionally or accidentally disclosed?).
• Communication. All of the above examples illustrate
the gray areas between law and ethics, or, from
an organizational point of view, risk management
and customer experience. As data becomes even
more valuable and ubiquitous, the way in which
organizations communicate—about collection,
analysis, intent and usage—will affect not only their
legal risk profile, but their ability to attract and retain
the trust and loyalty of their communities.
Finally, there is, as former secretary of defense Donald
Rumsfeld so famously called it, “the unknown unknown.”
The #AlexFromTarget story demonstrates not only an
example of how an everyday 16 year old (by definition, a
minor) can become an instant Internet celebrity but also
how a company can unwittingly and suddenly find itself
at the center of a crisis not of its own creation, one that
raises issues (compounded because of Lee’s age) of
employee privacy and even safety.
Figure 5 lays out these issues at a high level.
In the past, many of these ethical issues related to data
were cloaked behind proprietary systems and siloed data
stores. As data becomes ubiquitous, more integrated,
and more portable, however, the number and type of
ethical gray areas will multiply, along with a need to
distinguish the organization’s legal responsibilities, such
as what it discloses in a terms of service, from its ethical
ones—the actions it takes that promote or erode the trust
of its community.
20. Data sources
Data types
Sample size
How the data may have been filtered,
enriched or otherwise modified with:
Democratic
Location
Other metadata
Keyword selection
Human or algorithmic coding
Process for assessing precision, relevance, recall
How the organization may change the
experience based on data
Whether the organization plans to sell the data
in any form to a third party
How data is combined and its impact on
personally identifiable information (PII) or user
experience in general
What data is collected
How and for how long data is stored
Who owns the data
Who has the right to delete data (posts or entire profiles)
Process for deleting data (posts or entire profiles)
Who has the right to view/modify/share data (administration)
Whether and how the data can be extracted
Methodology
Usage
Aggregation
Storage &
Stewardship
Data Collection
The extent to which the
organization proactively
and transparently
informs users/custom-
ers about what and
how it collects, analyz-
es, stores, aggregates,
and uses their data
Action Communication
FIGURE 5 ETHICAL ISSUES RELATED TO DATA
20
22. Define data strategy and operating model
If data is to be considered a business-critical asset, it must be treated as such by leaders who drive and
instill strategy across the organization. In 2015, leaders must define what critical data streams are needed
to drive business goals, how they will source them, and what operating model is needed to process,
interpret, and act on them at the right time.
The challenge is that an organization’s departments (and therefore the data) tend to be siloed, which can
result in blind spots, organizational politics, and spiraling costs. Organizations must balance their need
for insight and competitive advantage on the one hand and privacy and rational cost of ownership on the
other. All too frequently, these dual imperatives are in conflict, sometimes unnecessarily so, because the
organization does not have a clear strategy for what data will be used and stored, what data will be used
but not stored, and what data is simply unnecessary.
Update analytics methodology to reflect new data realities
Analyzing unstructured data will never yield the same confidence levels as a simple binary choice; it will
always require interpretation. The key is to make that interpretation transparent, rigorous, and repeatable so
that others can reliably repeat analyses and yield the same or substantially similar results.
This is one area in which there is a tremendous difference between private and public institutions. In
private institutions, work process, product, and data tend to be proprietary. In public institutions, such as
universities, research is subject to the highest levels of scrutiny among academic publications and journals.
It’s also important to engineer the method of measurement into initiatives to reduce ambiguity and provide
a greater ability to trace impact. The broader the topic, the more hashtags can help confirm the provenance
and relevance of social conversation. Tracking codes and multivariate testing are also a useful if not
perfect solution.
Seek out critical thinking and diverse skill sets
Unquestionably, engineering and analytical skills, not to mention skills in applied statistics and data science,
will continue to gain value as organizations become ever more dependent on multiple data types. At the
same time, the demands of analyzing unstructured data also require skill in interpreting context related
to language and behavior, a challenge humans have had since we developed language. After all, even the
cleanest, most reliable data can be misinterpreted, whether intentionally or unintentionally.
To minimize misinterpretation means valuing not only math and engineering but also social sciences and
humanities. These disciplines—sociology, psychology, anthropology, linguistics, ethics, philosophy, and
rhetoric—provide context and help us become better critical thinkers. Without a balance of critical thinking,
business knowledge, and smart analytics tools, we’re in danger of making the wrong decision much more
efficiently, quickly, and with far greater impact than we have in the past.
If we—individually and collectively—are to make the best use of data and extract relevant
insight from it in a trustworthy manner, we must approach data strategy thoughtfully.
Following are some basic tenets of a strategic data plan.
22
3
2
1
23. CONCLUSION
The hype over “big data” has partially obscured the fact that our ability to collect, analyze,
and act on data—and to some extent predict outcomes based upon it—is a potentially
transformative force for business and humanity alike. While Aldous Huxley couldn’t have
anticipated the impact of a Kim Kardashian magazine cover or the challenges inherent
in understanding how people talk about smoking, he was prescient to call out the ever-
increasing difficulty of identifying relevance in a “sea of irrelevance.”32
It seems likely that the privacy and ethical implications of data ubiquity, not to mention
recent disclosures about government access to and use of personal data, would have
confirmed many of Orwell’s worst fears. At the same time, however, we do not need to
blindly accept the dystopian nightmare he envisioned as our only future. We have an
opportunity--and an obligation--to examine not only the legal, but the ethical implications
of ubiquitous data, and use this understanding to decide how we will use it, sustainably
and responsibly, for years to come.
23
Insist on ethical data use and transparent disclosure
Earl Warren, former chief justice of the United States, once said, “In civilized life, law floats in a sea of
ethics.”33
This is especially true of the digital age, in which few of the implications of digital transformation
have found their way into case law and, as a result, organizational policy. As organizations become
more data centric, for their own benefit as well as their customers’, they must also look closely at
the affirmative and passive decisions they make about where they get their data; their analytics
methodology; how they store, steward, aggregate, and use the data; and how transparently they disclose
these actions.
Reward and reinforce humility and learning
It is nearly impossible to calculate the impact that data will have in our lives in the next decade.
Technologies such as IBM’s Watson, Ayasdi, and others are illustrating the many applications for big
data, whether in healthcare, consumer products, financial services, energy, or elsewhere. Meanwhile, the
Internet of Things introduces data feeds from sensors, which can be combined with other data streams
to deliver specific, relevant, and even predictive insights that will only compound volume, velocity, and
variety challenges.
Yet the world is just starting to come to terms with the impact of data ubiquity from the technology,
business, research, cultural, and ethical perspectives. The most important and perhaps most difficult
impact of data ubiquity is the fact that it radically undermines traditional methods of analysis and laughs
at our desire for certainty. The only strategy to combat the fear of uncertainty is to accept and work with
the limits of the data and approach the science of challenging data sets with an appetite for continuous
learning, whether the goal is to sell a pair of shoes or to help prevent cancer
5
4
25. ENDNOTES
1
You can view the talk at http://www.ted.com/talks/susan_
etlinger_what_do_we_do_with_all_this_big_data.
2
Neil Postman, Amusing Ourselves to Death: Public Discourse in the
Age of Show Business (New York: Penguin Books,1985), vii.
3
For a more detailed view, a good starting point is ““3D Data
Management: Controlling Data Volume, Velocity and Variety,”“
published by META Group on February 6, 2001, http://blogs.
gartner.com/doug-laney/files/2012/01/ad949-3D-Data-
Management-Controlling-Data-Volume-Velocity-and-Variety.pdf.
4
“What Is Big Data?” IBM, accessed January 6, 2015, http://www-
01.ibm.com/software/data/bigdata/what-is-big-data.html.
5
“Statistics,” YouTube, accessed January 6, 2015, https://www.
youtube.com/yt/press/statistics.html.
6
“Stats,” WordPress, cached on November 2, 2014, http://
sq.wordpress.com/stats/.
7
“About,” Twitter, accessed January 6, 2015, https://about.twitter.
com/company.
8
Darin Stewart, ““Big Content: The Unstructured Side of Big Data,”“
Gartner Group, May 1, 2013, http://blogs.gartner.com/darin-
stewart/2013/05/01/big-content-the-unstructured-side-of-big-
data/.
9
Zacks Equity Research, “Stock Market News for
December 17, 2014 - Market News,” Yahoo! Finance,
December 17, 2014, http://finance.yahoo.com/news/
stock-market-news-december-17-151003130.html;_
ylt=AwrBJSCwLpNUWlIAatyTmYlQ.
10
Kalev Leetaru, “Why Big Data Missed the Early Warning Signs of
Ebola,” Foreign Policy, September 26, 2014, http://foreignpolicy.
com/2014/09/26/why-big-data-missed-the-early-warning-signs-
of-ebola/.
11
See also: Sherry L. Emery, Glen Szczypka, Eulàlia P. Abril,
Yoonsang Kim, and Lisa Vera, “Are You Scared Yet? Evaluating
Fear Appeal Messages in Tweets About the Tips Campaign,”
Journal of Communication, 64 (2014): 278–295, doi: 10.1111/
jcom.12083.
12
“Cohen’s Kappa, “University of Nebraska–Lincoln, accessed
January 6, 2015, http://psych.unl.edu/psycrs/handcomp/
hckappa.PDF.
13
Sherry L. Emery, “Are You Scared Yet?.”’’
14
Ibid.
15
Ibid.
16
“Linguistic Mapping Reveals How Word Meanings Sometimes
Change Overnight,” MIT Technology Review, November 23, 2014,
http://www.technologyreview.com/view/532776/linguistic-
mapping-reveals-how-word-meanings-sometimes-change-
overnight/.
17
Philip Stark, Twitter comment, November 24, 2014, https://
twitter.com/philipbstark/status/536955754163363840.
18
For a quick primer on descriptive, predictive, and prescriptive
analytics, see this interview with data scientist Michael Wu
of Lithium by Jeff Bertolucci in InformationWeek: http://www.
informationweek.com/big-data/big-data-analytics/big-
data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d-
id/1113279.
19
To download the text, go to http://classics.mit.edu/Aristotle/
sophist_refut.html.
20
Vigen maintains a running list of spurious correlations at his
blog, Spurious Correlations (http://tylervigen.com/).
21
For an excellent tutorial on logical fallacies, see chapter 2 of
“SticiGui,” an online statistics textbook by Philip B. Stark, professor
and chair of the department of statistics, University of California,
Berkeley: http://www.stat.berkeley.edu/~stark/SticiGui/Text/
reasoning.htm.
22
Rudder, Dataclysm, 146.
23
Kashmir Hill,, “How Target Figured Out a Teen Girl Was Pregnant
Before Her Father Did,” Forbes, February 16, 2012, http://www.
forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-
out-a-teen-girl-was-pregnant-before-her-father-did/.
24
Zoe Kleinman, “Samaritans App Monitors Twitter Feeds for
Suicide Warnings,” BBC News, October 28, 2014, http://www.bbc.
com/news/technology-29801214.
25
Adrian Short, “Shut Down Samaritans Radar,” Change.org,
accessed January 6, 2015, https://www.change.org/p/twitter-
inc-shut-down-samaritans-radar.
26
“Samaritans Radar announcement - Friday 7 November,”
Samaritans, November 7, 2014, http://www.samaritans.org/
news/samaritans-radar-announcement-friday-7-november.
25
26. 27
Adam D.I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock,
“Experimental Evidence of Massive-Scale Emotional Contagion
Through Social Networks,” Proceedings of the National Academy
of Sciences of the United States of America, vol. 111 (24), DOI:
10.1073/pnas.1320040111.
28
Voytek, “Rides of Glory,” Uber, cached March 26, 2012, https://
web.archive.org/web/20140828024924/http://blog.uber.com/
ridesofglory.
29
Kashmir Hill, “‘God View’: Uber Allegedly Stalked Users for
Party-Goers’ Viewing Pleasure (Updated),” Forbes, October 3,
2014, http://www.forbes.com/sites/kashmirhill/2014/10/03/
god-view-uber-allegedly-stalked-users-for-party-goers-viewing-
pleasure/. Talking Points Mem: Uber Let Job Applicant Access
Controversial December 1, 2014: http://talkingpointsmemo.
com/livewire/uber-job-applicant-ride-logs.
30
Caitlin MacNeal, “Report: Uber Let Job Applicant Access
Controversial “‘God View’ Mode,” Talking Points Memo, December
1, 2014, http://talkingpointsmemo.com/livewire/uber-job-
applicant-ride-logs.
31
Nick Bilton, “Alex from Target: The Other Side of Fame”, The
New York Times, November 12, 2014, http://www.nytimes.
com/2014/11/13/style/alex-from-target-the-other-side-of-fame.
html?_r=0.
32
Aldous Huxley, Brave New World Revisited (New York:
HarperCollins Publishers, 1958), 36.
33
Earl Warren, speech at the Louis Marshall Award Dinner of the
Jewish Theological Seminary (Americana Hotel, New York City,
November 11, 1962).
SOURCES AND ACKNOWLEDGMENTS
This document was developed as a companion piece to a talk
given at TED@IBM in San Francisco, California, on September 23,
2014. As such, it was built on online and in-person conversations
with market influencers, technology vendors, brands, academics,
and others on the effective and ethical use of big data, as well as
secondary research, including relevant and timely books, articles,
and news stories. My deepest gratitude to the following:
• The team at the Health Media Collaboratory at the University
of Illinois at Chicago, specifically Sherry Emery, Eman Aly, and
Glen Szcypka for sharing their research and methodology and
educating me about the nuances of interpreting big data for
medical research.
• My fellow board members at the Big Boulder Initiative for
their insights and perspective on the effective and ethical
use of social data: Pernille Bruun-Jensen, CMO, NetBase;
Damon Cortesi, Founder and CTO, Simply Measured; Jason
Gowans, Director, Data Lab, Nordstrom; Will McInnes, CMO,
Brandwatch; Chris Moody, Vice President, Data Strategy,
Twitter (Chair); Stuart Shulman, Founder and CEO, Texifier;
Carmen Sutter, Product Manager, Social, Adobe; and Tom
Watson, Head of Sales, Hanweck Associates, LLC.
• The team at TED who helped me hone and focus the talk and
provided invaluable feedback throughout: Juliet Blake
and Anna Bechtol.
• The team at IBM Social Business for planning, executing and
marketing a superb event: Michela Stribling, Beth McElroy,
Jacqueline Saenz and Michelle Killebrew.
• My fellow TED@IBM speakers: Gianluca Ambrosetti, Kare
Anderson, Brad Bird, Monika Blaumueller, Erick Brethenoux,
Lisa Seacat DeLuca, Jon Iwata, Bryan Kramer, Tan Le, Charlene
Li, Florian Pinel, Inhi Cho Suh, Marie Wallace,
and Kareem Yusuf.
• Philip Stark, professor and chair of Statistics, University of
California, Berkeley, for an extremely insightful perspective on
the methodological and organizational requirements of big
data, as well as access to his superb course materials.
26
27. 27
• The organizers and speakers at the International Symposium
on Digital Ethics at Loyola University in November 2014, with
whom I had some incredibly insightful conversations: Don
Heider, dean, School of Communication, Loyola University
Chicago; Thorsten Busch, senior research fellow, Institute for
Business Ethics, University of St. Gallen; Michael Koliska, PhD
candidate at University of Maryland; and Caitlin Ring, assistant
professor of strategic communication at Seattle University.
• Farida Vis, research fellow in the Social Sciences in the
Information School at the University of Sheffield.
• The teams at DataSift (Nick Halstead, Tim Barker, Jason Rose,
Seth Catalli); Lithium Technologies (Katy Keim and Nicol
Addison); and Oracle (Tara Roberts and Christine Wan) for
valuable insights along the way.
• Tyler Vigen for his Spurious Correlations blog, which makes a
complex topic simple and fun to explain; Gary Schroeder for
his wonderful visual storytelling of my TED talk; Daniel K. Davis
for his superb photography at TED@IBM; Vladimir Mirkovic for
graphic design; and Erin Brenner for copyediting.
• My talented teammates at Altimeter Group: Rebecca Lieb, who
edited this report, Cheryl Graves, Jessica Groopman, Jaimy
Szymanski, Christine Tran, and, of course, Charlene Li.
Input into this document does not represent a complete
endorsement of the report by the individuals or organizations
listed above. Finally, any errors are mine alone.
OPEN RESEARCH
This independent research report was 100% funded by Altimeter
Group. This report is published under the principle of Open
Research and is intended to advance the industry at no cost. This
report is intended for you to read, utilize, and share with others; if
you do so, please provide attribution to Altimeter Group.
PERMISSIONS
The Creative Commons License is Attribution-Noncommercial-
ShareAlike 3.0 United States, which can be found at https://
creativecommons.org/licenses/by-nc-sa/3.0/us/.
DISCLAIMER
ALTHOUGH THE INFORMATION AND DATA USED IN THIS REPORT HAVE BEEN PRODUCED
AND PROCESSED FROM SOURCES BELIEVED TO BE RELIABLE, NO WARRANTY EXPRESSED
OR IMPLIED IS MADE REGARDING THE COMPLETENESS, ACCURACY, ADEQUACY, OR USE
OF THE INFORMATION. THE AUTHORS AND CONTRIBUTORS OF THE INFORMATION AND
DATA SHALL HAVE NO LIABILITY FOR ERRORS OR OMISSIONS CONTAINED HEREIN OR FOR
INTERPRETATIONS THEREOF. REFERENCE HEREIN TO ANY SPECIFIC PRODUCT OR VENDOR
BY TRADE NAME, TRADEMARK, OR OTHERWISE DOES NOT CONSTITUTE OR IMPLY ITS
ENDORSEMENT, RECOMMENDATION, OR FAVORING BY THE AUTHORS OR CONTRIBUTORS
AND SHALL NOT BE USED FOR ADVERTISING OR PRODUCT ENDORSEMENT PURPOSES.
THEOPINIONS EXPRESSED HEREIN ARE SUBJECT TO CHANGE WITHOUT NOTICE.
28. About Us
How to Work with Us
Altimeter Group research is applied and brought to life in our client engagements. We help organizations understand and
take advantage of digital disruption. There are several ways Altimeter can help you with your business initiatives:
• Strategy Consulting. Altimeter creates strategies and plans to help companies act on disruptive business and
technology trends. Our team of analysts and consultants works with senior executives, strategists .and marketers on
needs assessment, strategy roadmaps, and pragmatic recommendations across disruptive trends.
• Education and Workshops. Engage an Altimeter speaker to help make the business case to executives or arm
practitioners with new knowledge and skills.
• Advisory. Retain Altimeter for ongoing research-based advisory: conduct an ad-hoc session to address an immediate
challenge; or gain deeper access to research and strategy counsel.
To learn more about Altimeter’s offerings, contact sales@altimetergroup.com.
28
Altimeter is a research and
consulting firm that helps
companies understand and
act on technology disruption.
We give business leaders the
insight and confidence to help
their companies thrive in the
face of disruption. In addition to
publishing research, Altimeter
Group analysts speak and
provide strategy consulting
on trends in leadership, digital
transformation, social business,
data disruption and content
marketing strategy.
Altimeter Group
1875 S Grant St #680
San Mateo, CA 94402
info@altimetergroup.com
www.altimetergroup.com
@altimetergroup
650.212.2272
Susan Etlinger, Industry Analyst
Susan Etlinger is an industry analyst at Altimeter Group,
where she works with global organizations to develop
data and analytics strategies that support their business
objectives. Susan has a diverse background in marketing
and strategic planning within both corporations and
agencies. She’s a frequent speaker on social data and
analytics and has been extensively quoted in outlets,
including Fast Company, BBC, The New York Times, and The
Wall Street Journal. Find her on Twitter at @setlinger and at
her blog, Thought Experiments, at susanetlinger.com.
Rebecca Lieb, Industry Analyst
Rebecca Lieb (@lieblink) covers digital advertising and
media, encompassing brands, publishers, agencies and
technology vendors. In addition to her background as a
marketing executive, she was VP and editor-in-chief of the
ClickZ Network for over seven years. She’s written two
books on digital marketing: The Truth About Search Engine
Optimization (2009) and Content Marketing (2011). Rebecca
blogs at www.rebeccalieb.com/blog.