BIG DATA IN NATO: WHAT 
IT MEANS TO YOU 
Jay Gendron 
jaygendron 
October 29, 2014
Our Journey 
• BIG DATA 
– What is BIG? 
– What is DATA? 
• Two Workhorses of Big Data 
– Enterprise Data Warehousing 
– Predictive Analytics 
• Weaponry Available 
– Open Source Tools 
– Open Source Data 
• Use Cases 
• Future Trends
BIG DATA: HOW 
BIG IS BIG?
Need something small… 
Let 1 byte = 1 paper thickness 
Image: 
http://pencilgrinder.wordpress.com/ 
20lb copy paper = 0.004” = 0.1 mm
01000010 01101001 01100111 00100000 
01000100 01100001 01110100 01100001 
00100000 01101001 01101110 00100000 
01001110 01000001 01010100 01001111 
00111010 00100000 01010111 01101000 
01100001 01110100 00100000 01101001 
01110100 00100000 01001101 01100101 
01100001 01101110 01110011 00100000 
01110100 01101111 00100000 01011001 
01101111 01110101 00101110 
This is 39 bytes… 
or in our thinking… 
a pile of paper 39 pages thick 
So 1 MB is a million sheets thick…
1 MB 
Imagine the front square 
Completely covered with 10,000 
sheets 
Stacked 100 sheets high 
That’s under one-half inch thick 
…1 GB?
33 feet tall 
10 meters
266 feet tall 
81 meters 
So what does 1 TB look like?
1 TB 
6.3 miles (10 km) high. Plane! Duck!
BIG data 
There is no technical definition 
“Big Data is at the heart of modern science and business…the 
necessity of grappling with Big Data, and the desirability of 
unlocking the information hidden within it, is now a key theme in 
all the sciences – arguably the key scientific theme of our times.” 
-Francis X. Diebold , University of Pennsylvania 
A Personal Perspective on the 
Origin(s) and Development of “Big Data”: 
The Phenomenon, the Term, and the Discipline (2012) 
3V’s =Volume 
Velocity 
Variety 
Laney, D. (2001)
Volume 
Large Synoptic Survey 
Telescope (LSST) 
40TB/day 
100+PB in 10-year lifetime 
Illumina HiSeq 2000 
DNA Sequencer 
~1TB/day; 30 TB/month 
Images: https://d396qusza40orc.cloudfront.net/datasci/lecture_slides/week1/005_escience.pdf
250 miles Exosphere 
186 miles Thermosphere 
25 miles Mesosphere 
6 miles Troposhere 
40 TB: 10,000 x 4,000,000,000 sheets 
high
How Big is the Internet? 
Size of the Internet as of 31st Dec 2013 
14.3 Trillion -Webpages, live on the Internet. 
48 Billion - Webpages indexed by Google Inc. 
14 Billion - Webpages indexed by Microsoft's Bing. 
672 Exabytes - 672,000,000,000 Gigabytes (GB) of accessible data. 
Source: http://www.factshunt.com/2014/01/total-number-of-websites-size-of.html 
1 EB = 1,000,000,000,000,000,000 = 1
Velocity 
43.6 EB = Total Internet Traffic in 2013
Volume + Velocity + 
Variety 
Image: http://www.mediabistro.com/alltwitter/files/2011/06/internet-60-seconds-infographic.jpg
DATA IS…
Data is… 
…that which aids 
Decision Making
The “WebMaster” 
Image: http://www.fivem.be/
The “DataMaster” 
• Hiring a room of PhD’s won’t solve 
Big Data 
• They have a role…as does IT 
• Ultimately Big Data will also be a 
team effort like the web buildout 
• …and You have a role on that team 
Image: http://www.fivem.be/
BIG DATA: THE 
WORKHORSES
Difficult Data is more apt 
Enterprise Data 
Warehousing 
Images: Elephant - http://www.marcolotz.com/?p=77 
Word Cloud - http://www.fotolia.com/id/36647313?by=serie 
Predictive 
Analytics
Enterprise Data Warehousing 
• What began with MapReduce in 2004 
• Evolved in open source like Hadoop 
• Permanent contributions of evolution: 
– Fault tolerance – running on many machines 
and accounting for failures 
– Schema-on-Read – more flexibility in working 
with data in different forms 
– User Defined Functions – giving developers 
more freedom in where to place queries 
Source: https://class.coursera.org/datasci-002/lecture/15
Predictive Analytics 
• Business Intelligence 
• Statistics 
• Visualization 
• Programming 
• Machine Learning
Desired End State 
Image: http://www.ted.com/talks/nate_silver_on_race_and_politics?language=en
A team approach 
BI 
Predict 
Stats 
Viz Analyst 
S/W 
+ scale 
+ algorithm 
+ statistics 
+ programming + data products
Impact of the Phenomenon 
Theoretical 
Empirical 
Empirical + Computational 
Images: Galileo - http://www.crystalinks.com/galileo.html; Formulae - https://msschwarzeducationstation.wordpress.com/page/2/; Computers - 
http://www.utsystem.edu/blog/2011/09/26/ut-austin-awarded-50-million-build-faster-more-powerful-supercomputer; Book Cover - 
http://radar.oreilly.com/2011/09/building-data-science-teams.html
Human-Computer Symbiosis 
Sankar, S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved from 
https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
People and Culture Count 
People are “tinkerers” & “hobbyists” 
Experience Perspective 
We ALL have 1 thing in common 
WE ARE ALL DIFFERENT!
THE WEAPONRY: 
OPEN SOURCES
Source: Sankar (2012). https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
Repurposing Data
Open Source Tools
Visual Text Analytics 
Image: http://hermeneuti.ca/voyeur/tools
Open Datasets
Again…team approach 
SCIENCE 
• Stats 
• S/W 
DOMAINS 
• SME 
• BI 
INFO TECH 
• DBA 
• Viz 
TTTTT
Skill Building 
• Free Courses 
• Meetups 
• Hackathons 
• Podcasts 
Image: http://www.amazon.com/Data-Science-Business-data-analytic-thinking/dp/1449361323
…to make the point 
…on page 13 
Fawcett, T. & Provost, F. (2013, August 9). Data Science for Business. Retrieved from 
https://www.safaribooksonline.com/library/view/data-science-for/9781449374273/.
DATA USE CASES
Remember…You have a role 
Informal poll at Univ of WA: 
How much time do you spend 
“handling data” as opposed to 
“doing science”? 
Most given response: 90%
Image: http://www.ibmbigdatahub.com/infographic/infographic-big-data-exploration
Image: http://www.ibmbigdatahub.com/infographic/infographic-enhanced-360-view-customer
Image: http://www.ibmbigdatahub.com/infographic/security-intelligence-extension
Fraud Detection 
Image: http://www.ibmbigdatahub.com/infographic/countering-fraud-big-data-world
Human-Computer Symbiosis 
Sankar, S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved from 
https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
Image: http://www.ibmbigdatahub.com/infographic/operations-analytics
Data Warehouse 
Augmentation 
Existing + NEW 
Operational Efficiencies 
More Data 
Images: Arrow - http://canadawebservices.com/7-powerful-ways-increase-website-traffic/; Explore - 
http://www.keadventure.com/page/explore_more!.html
Leveraging Use Cases 
Business and Commerce 
• Big Data Exploration 
• 360 Degree View 
• Security & Intelligence 
• Operational Analysis 
• Data Warehouse 
Augmentation 
NATO Enterprise 
• FMN (“experiment” and 
data mining) 
• Enterprise (supporting 
commands) 
• Cyber Defence (threat 
tactics) 
• C2 (requirements text 
analysis) 
• Ent. Architecture & 
Technology (req’ts)
FUTURE TRENDS
According to Gartner 
Image: http://www.forbes.com/sites/gilpress/2014/08/18/its-official-the-internet-of-things-takes-over-big-data-as-the-most-hyped-technology/
According to IBM 
• More Analytics – Less Gut 
• Data security and privacy 
• Leaders with data knowledge 
• Data-centric applications 
• Integrating internal and external 
• Investments in platforms 
An example 
Cho, I. (2013, February 3). 6 trends in big data and analytics [IBM Big Data Hub]. Podcast retrieved from 
http://www.ibmbigdatahub.com/podcast/6-trends-big-data-and-analytics
Visual Analytics 
Koblin, A. (2011, March). Visualizing ourselves ... with crowd-sourced data [TED2011]. Podcast retrieved from 
http://www.ted.com/talks/aaron_koblin?language=en
Art and science meet 
Miebach, N. (2011, July). Art made of storms [TEDGlobal 2011]. Podcast retrieved 
from http://www.ted.com/talks/nathalie_miebach?language=en
Summary 
• Big Data – has implications 
• Big Data = Data + Analytics 
• Open Source – tools and data 
• Use Case – leverage others’ results 
• Future 
– More analytics and applications 
– Need for data fluency among managers 
– Need processes to encourage exploring
You know 
WHAT
…the SO 
WHAT
NOW 
WHAT?
Empower 
Make apps 
Find data
References 
Cho, I. (2013, February 3). 6 trends in big data and analytics [IBM Big Data Hub]. Podcast retrieved from 
http://www.ibmbigdatahub.com/podcast/6-trends-big-data-and-analytics. 
Diebold, F.X. (2012, November 26). A personal perspective on the origin(s) and development of “big data”: 
The phenomenon, the term, and the discipline. Retrieved from 
http://www.ssc.upenn.edu/~fdiebold/papers/paper112/Diebold_Big_Data.pdf. 
Fawcett, T. & Provost, F. (2013, August 9). Data Science for Business. Retrieved from 
https://www.safaribooksonline.com/library/view/data-science-for/9781449374273/. (on page 13) 
Howe, B. (2013). Data science in science [PDF document]. Retrieved from Lecture Notes Online Web site: 
https://d396qusza40orc.cloudfront.net/datasci/lecture_slides/week1/005_escience.pdf. 
Koblin, A. (2011, March). Visualizing ourselves ... with crowd-sourced data [TED2011]. Podcast retrieved from 
http://www.ted.com/talks/aaron_koblin?language=en. 
Laney, D. (2001), 3-D data management: Controlling data volume, velocity and variety. META Group Research 
Note, February 6. Retrieved from http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data- 
Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. 
Miebach, N. (2011, July). Art made of storms [TEDGlobal 2011]. Podcast retrieved from 
http://www.ted.com/talks/nathalie_miebach?language=en. 
Sall, E. (2013, February 12). Top 5 big use cases [IBM Big Data Hub]. Podcast retrieved from 
http://www.ibmbigdatahub.com/podcast/top-5-big-data-use-cases. 
Sankar, S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved from 
https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en.

Big Data in NATO and Your Role

  • 1.
    BIG DATA INNATO: WHAT IT MEANS TO YOU Jay Gendron jaygendron October 29, 2014
  • 3.
    Our Journey •BIG DATA – What is BIG? – What is DATA? • Two Workhorses of Big Data – Enterprise Data Warehousing – Predictive Analytics • Weaponry Available – Open Source Tools – Open Source Data • Use Cases • Future Trends
  • 4.
    BIG DATA: HOW BIG IS BIG?
  • 5.
    Need something small… Let 1 byte = 1 paper thickness Image: http://pencilgrinder.wordpress.com/ 20lb copy paper = 0.004” = 0.1 mm
  • 6.
    01000010 01101001 0110011100100000 01000100 01100001 01110100 01100001 00100000 01101001 01101110 00100000 01001110 01000001 01010100 01001111 00111010 00100000 01010111 01101000 01100001 01110100 00100000 01101001 01110100 00100000 01001101 01100101 01100001 01101110 01110011 00100000 01110100 01101111 00100000 01011001 01101111 01110101 00101110 This is 39 bytes… or in our thinking… a pile of paper 39 pages thick So 1 MB is a million sheets thick…
  • 7.
    1 MB Imaginethe front square Completely covered with 10,000 sheets Stacked 100 sheets high That’s under one-half inch thick …1 GB?
  • 8.
    33 feet tall 10 meters
  • 9.
    266 feet tall 81 meters So what does 1 TB look like?
  • 10.
    1 TB 6.3miles (10 km) high. Plane! Duck!
  • 11.
    BIG data Thereis no technical definition “Big Data is at the heart of modern science and business…the necessity of grappling with Big Data, and the desirability of unlocking the information hidden within it, is now a key theme in all the sciences – arguably the key scientific theme of our times.” -Francis X. Diebold , University of Pennsylvania A Personal Perspective on the Origin(s) and Development of “Big Data”: The Phenomenon, the Term, and the Discipline (2012) 3V’s =Volume Velocity Variety Laney, D. (2001)
  • 12.
    Volume Large SynopticSurvey Telescope (LSST) 40TB/day 100+PB in 10-year lifetime Illumina HiSeq 2000 DNA Sequencer ~1TB/day; 30 TB/month Images: https://d396qusza40orc.cloudfront.net/datasci/lecture_slides/week1/005_escience.pdf
  • 13.
    250 miles Exosphere 186 miles Thermosphere 25 miles Mesosphere 6 miles Troposhere 40 TB: 10,000 x 4,000,000,000 sheets high
  • 14.
    How Big isthe Internet? Size of the Internet as of 31st Dec 2013 14.3 Trillion -Webpages, live on the Internet. 48 Billion - Webpages indexed by Google Inc. 14 Billion - Webpages indexed by Microsoft's Bing. 672 Exabytes - 672,000,000,000 Gigabytes (GB) of accessible data. Source: http://www.factshunt.com/2014/01/total-number-of-websites-size-of.html 1 EB = 1,000,000,000,000,000,000 = 1
  • 17.
    Velocity 43.6 EB= Total Internet Traffic in 2013
  • 18.
    Volume + Velocity+ Variety Image: http://www.mediabistro.com/alltwitter/files/2011/06/internet-60-seconds-infographic.jpg
  • 19.
  • 20.
    Data is… …thatwhich aids Decision Making
  • 21.
    The “WebMaster” Image:http://www.fivem.be/
  • 22.
    The “DataMaster” •Hiring a room of PhD’s won’t solve Big Data • They have a role…as does IT • Ultimately Big Data will also be a team effort like the web buildout • …and You have a role on that team Image: http://www.fivem.be/
  • 23.
    BIG DATA: THE WORKHORSES
  • 24.
    Difficult Data ismore apt Enterprise Data Warehousing Images: Elephant - http://www.marcolotz.com/?p=77 Word Cloud - http://www.fotolia.com/id/36647313?by=serie Predictive Analytics
  • 25.
    Enterprise Data Warehousing • What began with MapReduce in 2004 • Evolved in open source like Hadoop • Permanent contributions of evolution: – Fault tolerance – running on many machines and accounting for failures – Schema-on-Read – more flexibility in working with data in different forms – User Defined Functions – giving developers more freedom in where to place queries Source: https://class.coursera.org/datasci-002/lecture/15
  • 26.
    Predictive Analytics •Business Intelligence • Statistics • Visualization • Programming • Machine Learning
  • 27.
    Desired End State Image: http://www.ted.com/talks/nate_silver_on_race_and_politics?language=en
  • 28.
    A team approach BI Predict Stats Viz Analyst S/W + scale + algorithm + statistics + programming + data products
  • 29.
    Impact of thePhenomenon Theoretical Empirical Empirical + Computational Images: Galileo - http://www.crystalinks.com/galileo.html; Formulae - https://msschwarzeducationstation.wordpress.com/page/2/; Computers - http://www.utsystem.edu/blog/2011/09/26/ut-austin-awarded-50-million-build-faster-more-powerful-supercomputer; Book Cover - http://radar.oreilly.com/2011/09/building-data-science-teams.html
  • 30.
    Human-Computer Symbiosis Sankar,S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved from https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
  • 31.
    People and CultureCount People are “tinkerers” & “hobbyists” Experience Perspective We ALL have 1 thing in common WE ARE ALL DIFFERENT!
  • 32.
  • 33.
    Source: Sankar (2012).https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
  • 34.
  • 35.
  • 36.
    Visual Text Analytics Image: http://hermeneuti.ca/voyeur/tools
  • 37.
  • 38.
    Again…team approach SCIENCE • Stats • S/W DOMAINS • SME • BI INFO TECH • DBA • Viz TTTTT
  • 39.
    Skill Building •Free Courses • Meetups • Hackathons • Podcasts Image: http://www.amazon.com/Data-Science-Business-data-analytic-thinking/dp/1449361323
  • 40.
    …to make thepoint …on page 13 Fawcett, T. & Provost, F. (2013, August 9). Data Science for Business. Retrieved from https://www.safaribooksonline.com/library/view/data-science-for/9781449374273/.
  • 41.
  • 42.
    Remember…You have arole Informal poll at Univ of WA: How much time do you spend “handling data” as opposed to “doing science”? Most given response: 90%
  • 43.
  • 44.
  • 45.
  • 46.
    Fraud Detection Image:http://www.ibmbigdatahub.com/infographic/countering-fraud-big-data-world
  • 47.
    Human-Computer Symbiosis Sankar,S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved from https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en
  • 48.
  • 49.
    Data Warehouse Augmentation Existing + NEW Operational Efficiencies More Data Images: Arrow - http://canadawebservices.com/7-powerful-ways-increase-website-traffic/; Explore - http://www.keadventure.com/page/explore_more!.html
  • 50.
    Leveraging Use Cases Business and Commerce • Big Data Exploration • 360 Degree View • Security & Intelligence • Operational Analysis • Data Warehouse Augmentation NATO Enterprise • FMN (“experiment” and data mining) • Enterprise (supporting commands) • Cyber Defence (threat tactics) • C2 (requirements text analysis) • Ent. Architecture & Technology (req’ts)
  • 51.
  • 52.
    According to Gartner Image: http://www.forbes.com/sites/gilpress/2014/08/18/its-official-the-internet-of-things-takes-over-big-data-as-the-most-hyped-technology/
  • 53.
    According to IBM • More Analytics – Less Gut • Data security and privacy • Leaders with data knowledge • Data-centric applications • Integrating internal and external • Investments in platforms An example Cho, I. (2013, February 3). 6 trends in big data and analytics [IBM Big Data Hub]. Podcast retrieved from http://www.ibmbigdatahub.com/podcast/6-trends-big-data-and-analytics
  • 54.
    Visual Analytics Koblin,A. (2011, March). Visualizing ourselves ... with crowd-sourced data [TED2011]. Podcast retrieved from http://www.ted.com/talks/aaron_koblin?language=en
  • 55.
    Art and sciencemeet Miebach, N. (2011, July). Art made of storms [TEDGlobal 2011]. Podcast retrieved from http://www.ted.com/talks/nathalie_miebach?language=en
  • 56.
    Summary • BigData – has implications • Big Data = Data + Analytics • Open Source – tools and data • Use Case – leverage others’ results • Future – More analytics and applications – Need for data fluency among managers – Need processes to encourage exploring
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
    References Cho, I.(2013, February 3). 6 trends in big data and analytics [IBM Big Data Hub]. Podcast retrieved from http://www.ibmbigdatahub.com/podcast/6-trends-big-data-and-analytics. Diebold, F.X. (2012, November 26). A personal perspective on the origin(s) and development of “big data”: The phenomenon, the term, and the discipline. Retrieved from http://www.ssc.upenn.edu/~fdiebold/papers/paper112/Diebold_Big_Data.pdf. Fawcett, T. & Provost, F. (2013, August 9). Data Science for Business. Retrieved from https://www.safaribooksonline.com/library/view/data-science-for/9781449374273/. (on page 13) Howe, B. (2013). Data science in science [PDF document]. Retrieved from Lecture Notes Online Web site: https://d396qusza40orc.cloudfront.net/datasci/lecture_slides/week1/005_escience.pdf. Koblin, A. (2011, March). Visualizing ourselves ... with crowd-sourced data [TED2011]. Podcast retrieved from http://www.ted.com/talks/aaron_koblin?language=en. Laney, D. (2001), 3-D data management: Controlling data volume, velocity and variety. META Group Research Note, February 6. Retrieved from http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data- Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Miebach, N. (2011, July). Art made of storms [TEDGlobal 2011]. Podcast retrieved from http://www.ted.com/talks/nathalie_miebach?language=en. Sall, E. (2013, February 12). Top 5 big use cases [IBM Big Data Hub]. Podcast retrieved from http://www.ibmbigdatahub.com/podcast/top-5-big-data-use-cases. Sankar, S. (2012, June). The rise of human-computer cooperation [TEDGlobal 2012]. Podcast retrieved from https://www.ted.com/talks/shyam_sankar_the_rise_of_human_computer_cooperation?language=en.

Editor's Notes

  • #62 Microsoft Confidential