1
Beyond Big Data


   Riding the
Technology Wave

                  Ira A. (Gus) Hunt
      !
           Chief Technology Officer
Our Mission
We are the nation's first line of defense. We accomplish
what others cannot accomplish and go where others
cannot go. We carry out our mission by:

   Collecting information that reveals the plans, intentions and
   capabilities of our adversaries and provides the basis for
   decision and action.

   Producing timely analysis that provides insight, warning and
   opportunity to the President and decisionmakers charged with
   protecting and advancing America's interests.

   Conducting covert action at the direction of the President to
   preempt threats or achieve US policy objectives.
4 Big Bets
1	
     Revolutionize Big               Data Exploitation
          –  Acquire, federate, secure and exploit. Grow the haystack, magnify the needles.




2	
     Accelerate Operational                              Excellence
          –  Innovate IT operations and run IT like a business.




3	
     Serve CIA by supporting the IC
          –  Assume a leadership role in IC activities that matter to CIA; Build to share




4	
     Drive Performance through Talent                                 Management
          –  Focus on continuous learning and diversity of thought, experience, background
6 Key Technology Enablers
0	
     Secure Mobility
           –    Immediate, secure and appropriate access to people, data and tools from anywhere at anytime



1	
     Advanced Mission Analytics—Analytics as a Service
           –    World-class abilities to discover patterns, correlate information, understand plans and intentions,
                and find and identify operational targets in a sea of data. Big Data analytics as a service


2	
     Enterprise Widgets and Services
           –    A customizable, integrated and adaptive webtop that lets analysts, ops officers, and
                targeters to “have it their way”. Personalization in context.


3	
     Security as a Service
           –    One environment, all data, protected and secure.--ubiquitous encryption, enterprise
                authentication, audit, DRM, secure ID propagation, and Gold Version C&A.



4	
      Data Harbor—Data as a Service
           –    An ultra-high performance data environment that enables CIA missions to acquire,
                federate, and position and securely exploit huge volumes data. Data in context.


5	
     Cloud Computing—Infrastructure as a Service
           –    Capacity ahead of demand. Large scale, elastic, commodity hosting, storage, and compute
It’s a

Big Data
  World


           6
Google
       > 100 PB
  > 1T indexed URLs
  > 3 million servers
> 7.2B page-views/day
                        7
FaceBook
      > 1 billion users
   > 300PB; +> 500TB/day
> 35% of world’s photographs

                          8
YouTube
        > 1000PB
  +>72 hours/minute
>37 million hours/year
 > 4 billion views/day
                         9
World Population
  > 7,057,065,162


                    10
Twitter
> 124B tweets/year
    > 390M/day
     ~4500/sec

                     11
Global Text Messages
     > 6.1T per year
   > 193,000 per second
  > 876 per person per year


                              12
US Cell Calls
   > 2.2 T minutes/year
  > 19 minutes / person / day
(uncompressed < 1 YouTube/year)


                                13
3
Driving Forces
                 14
Social
   Mobile
         Cloud
                 15
+   +      =
Big Data
               16
+       +
Increases the velocity of
  innovation
                        17
+      +
Accelerates social
   Change
                     18
19
+      +
 Altered the
   Flow
of Information
                 20
3
Emerging
 Forces
Nano
   Bio
       Sensors
                 22
Mobile Sensor Platform
       Microphone
       Image
       3-axis accelerometer
       Touch
       Light
       Proximity
       Geolocation

Communicator, Tricorder, Transporter
                                       23
Mobile Health Platform

Pacemaker
Blood sugar tester
Insulin controller
Health monitor
Exercise coach
Remote tune-ups
Early warning system

                       24
Mobile Sensor Platform

Identity by 3-axis accelerometer
    Gender (71%)
    Height--tall or short (80%)
    Weight--heavy or light (80%)
    You by your gait (100%)

           Actitracker—Android App



                                     25
+       +          +
     +       +          =
The inanimate becomes
       sentient
                        26
+         +          +
      +         +          =
     Smarter Planet
  Cars drive themselves
Machines know your needs
                           27
+          +       +
           +          +       =
Drive radical efficiencies
Enhance social engagement
Improve information sharing
Enables global reach
Green (automatic routing)
Improve our health
Stop/prevent crime
…                             28
Sensors are Really Big


1	
     Sensors are unbounded


2	
     Sensors are promiscuous


3	
     Sensors are indiscriminate
The Internet of Things is Bigger


1	
     Everything is Connected


2	
     Everything   Communicates

3	
     Everything is a Sensor
That’s the

Really Big Data
 Challenge of the future


                           31
Why
 We
  Care

         32
Why
 We
  Care

         33
Why
 We
  Care

         34
Why
 We
  Care

         35
Impact of Big Data

1	
     Know what we know

2	
     Discover the gaps in our knowledge

3	
     Focus targeting to fill the gaps

4	
     More effective use of expensive or long lead
        collection assets

5	
     Better global coverage to limit surprise

6	
     Enhance understanding and improve analysis
Implications


               37
4 Rules of Big Data

1	
     It’s the data…
                         - Apologies to James Carville




2	
     Power to the people
                         - Apologies to the Black Panthers




3	
     Latency breeds contempt
                         - Apologies to Aesop



4	
     Context, context, context
                         - Apologies to Lord Harold Samuel
It’s the Data…


                 39
Data vs Tools—A History
                            Lesson

•  Sophisticated tools without the data are
   useless

•  Mediocre tools with the data are frustrating


•  Analysts will always opt for frustration over
   futility, if that is their only option
Our Job
1	
     Leverage the Big Data world


2	
     Find the Information that Matters


3	
     Connect the Dots


4	
     Understand the Plans of our Adversaries
             Safeguard our national security
The
Problem


          42
Our Problem: Which 5K

1	
     Don’t know the future value of data


2	
     We cannot connect dots we don’t have


3	
     Traditional, requirements driven, collection
        fails in the Big Data world
               - Can’t task for data you don’t know you do need
               - The few cannot know the needs of the many
               - Global Coverage requires Global Data
Characteristics of Big Data


1	
     More is always better

2	
     Signal to noise only gets worse

3	
     Enumeration not modeling

4	
     Requirements are usually hindsight
Data as a Service

•  Analysts and operators are not data engineers
•  Need insight and understanding
•  Ask a question and get a coherent answer
•  Cannot know what data sets contain
   information of value to them
•  Imbue data services and tools with those
   smarts
•  Smart Data, smart tools, smarter intelligence

                                                   45
Power to the
  People

               46
Today

•  Analytics and tools are hard to use
•  Specialists are required to derive value
•  Skilled people are in short supply
•  Algorithms are dense and arcane
•  Require a lot of hand curation
•  Built for business not for intelligence

                                              47
New Fields of
 Expertise
    Data Scientist
Information Engineer

                       48
Data Science
* Data science combines elements from many
fields:
      Math
      Statistics
      Data Engineering
      Pattern Recognition and Learning
      Advanced Computing
      Visualization
      Uncertainty Modeling
      Data Warehousing
      High performance computing


                                         * Wikipedia
Big Data Democracy Wins


 The power of big data
can only be fully realized
 when it is in the hands
  of the average user


                             50
Tomorrow

•    Elegant, powerful and easy to use tools and
     visualizations

•    Machines to do more of the heavy lifting

•    Intelligent systems that learn from the user

•    Correlation not search

•    “Curiosity layer”– machines that are curious on
     your behalf
7 Universal Constructs for
                  Analytics

People          Events


Places          Concepts


Organizations   Things


Time

                           52
User Built Recipes




                53
Keep it Simple

•  Data Scientists focus on hard problems

•  Build reusable components that anyone
   can apply—Recipes

•  Share them widely—Apps Store/Apps Mall
   —Recipe Book

•  Let users assemble components their way
  •    Experiment and fail quickly to succeed faster
Latency Breeds
   Contempt

                 55
Its All About Speed

•  Hadoop/Map Reduce—batch
  •    Flexible, powerful, slow


•  Equivalent of Real-Time Map/Reduce
  •    Flexible, powerful and fast
  •    Demel, Caffeine, Impala, Apache Drill, Spanner…


•  Recursive Streams processing w/
   complex analytics

•  In-memory—peta-scale RAM architectures
  •    Distributed, in-memory analytics
Tectonic Technology Shifts
Traditional Processing       Mass Analytics/Big Data
             Data on SAN     Data at processor
   Move Data to Question     Move Question to Data
                   Backup    Replication management
          Vertical scaling   Horizontal scaling
   Capacity after demand     Capacity ahead of demand
                       DR    COOP
        Size to peak load    Dynamic/elastic provisioning
                     Tape    SAN
                      SAN    Disk
                     Disk    SSD
              RAM limited    Peta-scale RAM
New Computing Architectures

•  Data close to compute
•  Power at the edge
•  Optical Computing/Optical Bus
•  End of the motherboard—shared pools of
   everything
•  Software defined everything—compute,
   storage, networking, data center
•  Network is the bottleneck and constraint
Context,
    Context,
         Context

                   59
Everything in Your Frame of
                       Reference
•  Widgets—Webtop in context to business

•  Schema on Read—Data in context to your
   question

•  User assembled analytics—answers in
   context to your questions

•  Elastic computing—computing in context
   to your demand
Closing
Thoughts

           61
High Noon
     in the
Information Age

                  62
It is nearly within our grasp
  to compute on all human
    generated information


                            63
FaceBook
    > 1 billion users
> 35% of all photographs

                       64
The inanimate is rapidly
    becoming sentient

Smarter Planet
  Cars drive themselves
     Machines know your needs
                                65
3 Wave of
   rd

 Computing
Cognitive Machines
        Watson
                     66
+           +            +
                   +           +            =
Moving faster than government can keep up

The legal system is woefully behind

What are your rights? Who owns your data?

Driving the pace of social change

Exponentially increasing cyber threats      67
68

THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013

  • 1.
  • 2.
    Beyond Big Data Riding the Technology Wave Ira A. (Gus) Hunt ! Chief Technology Officer
  • 3.
    Our Mission We arethe nation's first line of defense. We accomplish what others cannot accomplish and go where others cannot go. We carry out our mission by: Collecting information that reveals the plans, intentions and capabilities of our adversaries and provides the basis for decision and action. Producing timely analysis that provides insight, warning and opportunity to the President and decisionmakers charged with protecting and advancing America's interests. Conducting covert action at the direction of the President to preempt threats or achieve US policy objectives.
  • 4.
    4 Big Bets 1   Revolutionize Big Data Exploitation –  Acquire, federate, secure and exploit. Grow the haystack, magnify the needles. 2   Accelerate Operational Excellence –  Innovate IT operations and run IT like a business. 3   Serve CIA by supporting the IC –  Assume a leadership role in IC activities that matter to CIA; Build to share 4   Drive Performance through Talent Management –  Focus on continuous learning and diversity of thought, experience, background
  • 5.
    6 Key TechnologyEnablers 0   Secure Mobility –  Immediate, secure and appropriate access to people, data and tools from anywhere at anytime 1   Advanced Mission Analytics—Analytics as a Service –  World-class abilities to discover patterns, correlate information, understand plans and intentions, and find and identify operational targets in a sea of data. Big Data analytics as a service 2   Enterprise Widgets and Services –  A customizable, integrated and adaptive webtop that lets analysts, ops officers, and targeters to “have it their way”. Personalization in context. 3   Security as a Service –  One environment, all data, protected and secure.--ubiquitous encryption, enterprise authentication, audit, DRM, secure ID propagation, and Gold Version C&A. 4   Data Harbor—Data as a Service –  An ultra-high performance data environment that enables CIA missions to acquire, federate, and position and securely exploit huge volumes data. Data in context. 5   Cloud Computing—Infrastructure as a Service –  Capacity ahead of demand. Large scale, elastic, commodity hosting, storage, and compute
  • 6.
  • 7.
    Google > 100 PB > 1T indexed URLs > 3 million servers > 7.2B page-views/day 7
  • 8.
    FaceBook > 1 billion users > 300PB; +> 500TB/day > 35% of world’s photographs 8
  • 9.
    YouTube > 1000PB +>72 hours/minute >37 million hours/year > 4 billion views/day 9
  • 10.
    World Population > 7,057,065,162 10
  • 11.
    Twitter > 124B tweets/year > 390M/day ~4500/sec 11
  • 12.
    Global Text Messages > 6.1T per year > 193,000 per second > 876 per person per year 12
  • 13.
    US Cell Calls > 2.2 T minutes/year > 19 minutes / person / day (uncompressed < 1 YouTube/year) 13
  • 14.
  • 15.
    Social Mobile Cloud 15
  • 16.
    + + = Big Data 16
  • 17.
    + + Increases the velocity of innovation 17
  • 18.
    + + Accelerates social Change 18
  • 19.
  • 20.
    + + Altered the Flow of Information 20
  • 21.
  • 22.
    Nano Bio Sensors 22
  • 23.
    Mobile Sensor Platform Microphone Image 3-axis accelerometer Touch Light Proximity Geolocation Communicator, Tricorder, Transporter 23
  • 24.
    Mobile Health Platform Pacemaker Bloodsugar tester Insulin controller Health monitor Exercise coach Remote tune-ups Early warning system 24
  • 25.
    Mobile Sensor Platform Identityby 3-axis accelerometer Gender (71%) Height--tall or short (80%) Weight--heavy or light (80%) You by your gait (100%) Actitracker—Android App 25
  • 26.
    + + + + + = The inanimate becomes sentient 26
  • 27.
    + + + + + = Smarter Planet Cars drive themselves Machines know your needs 27
  • 28.
    + + + + + = Drive radical efficiencies Enhance social engagement Improve information sharing Enables global reach Green (automatic routing) Improve our health Stop/prevent crime … 28
  • 29.
    Sensors are ReallyBig 1   Sensors are unbounded 2   Sensors are promiscuous 3   Sensors are indiscriminate
  • 30.
    The Internet ofThings is Bigger 1   Everything is Connected 2   Everything Communicates 3   Everything is a Sensor
  • 31.
    That’s the Really BigData Challenge of the future 31
  • 32.
    Why We Care 32
  • 33.
    Why We Care 33
  • 34.
    Why We Care 34
  • 35.
    Why We Care 35
  • 36.
    Impact of BigData 1   Know what we know 2   Discover the gaps in our knowledge 3   Focus targeting to fill the gaps 4   More effective use of expensive or long lead collection assets 5   Better global coverage to limit surprise 6   Enhance understanding and improve analysis
  • 37.
  • 38.
    4 Rules ofBig Data 1   It’s the data… - Apologies to James Carville 2   Power to the people - Apologies to the Black Panthers 3   Latency breeds contempt - Apologies to Aesop 4   Context, context, context - Apologies to Lord Harold Samuel
  • 39.
  • 40.
    Data vs Tools—AHistory Lesson •  Sophisticated tools without the data are useless •  Mediocre tools with the data are frustrating •  Analysts will always opt for frustration over futility, if that is their only option
  • 41.
    Our Job 1   Leverage the Big Data world 2   Find the Information that Matters 3   Connect the Dots 4   Understand the Plans of our Adversaries Safeguard our national security
  • 42.
  • 43.
    Our Problem: Which5K 1   Don’t know the future value of data 2   We cannot connect dots we don’t have 3   Traditional, requirements driven, collection fails in the Big Data world - Can’t task for data you don’t know you do need - The few cannot know the needs of the many - Global Coverage requires Global Data
  • 44.
    Characteristics of BigData 1   More is always better 2   Signal to noise only gets worse 3   Enumeration not modeling 4   Requirements are usually hindsight
  • 45.
    Data as aService •  Analysts and operators are not data engineers •  Need insight and understanding •  Ask a question and get a coherent answer •  Cannot know what data sets contain information of value to them •  Imbue data services and tools with those smarts •  Smart Data, smart tools, smarter intelligence 45
  • 46.
    Power to the People 46
  • 47.
    Today •  Analytics andtools are hard to use •  Specialists are required to derive value •  Skilled people are in short supply •  Algorithms are dense and arcane •  Require a lot of hand curation •  Built for business not for intelligence 47
  • 48.
    New Fields of Expertise Data Scientist Information Engineer 48
  • 49.
    Data Science * Datascience combines elements from many fields: Math Statistics Data Engineering Pattern Recognition and Learning Advanced Computing Visualization Uncertainty Modeling Data Warehousing High performance computing * Wikipedia
  • 50.
    Big Data DemocracyWins The power of big data can only be fully realized when it is in the hands of the average user 50
  • 51.
    Tomorrow •  Elegant, powerful and easy to use tools and visualizations •  Machines to do more of the heavy lifting •  Intelligent systems that learn from the user •  Correlation not search •  “Curiosity layer”– machines that are curious on your behalf
  • 52.
    7 Universal Constructsfor Analytics People Events Places Concepts Organizations Things Time 52
  • 53.
  • 54.
    Keep it Simple • Data Scientists focus on hard problems •  Build reusable components that anyone can apply—Recipes •  Share them widely—Apps Store/Apps Mall —Recipe Book •  Let users assemble components their way •  Experiment and fail quickly to succeed faster
  • 55.
    Latency Breeds Contempt 55
  • 56.
    Its All AboutSpeed •  Hadoop/Map Reduce—batch •  Flexible, powerful, slow •  Equivalent of Real-Time Map/Reduce •  Flexible, powerful and fast •  Demel, Caffeine, Impala, Apache Drill, Spanner… •  Recursive Streams processing w/ complex analytics •  In-memory—peta-scale RAM architectures •  Distributed, in-memory analytics
  • 57.
    Tectonic Technology Shifts TraditionalProcessing Mass Analytics/Big Data Data on SAN Data at processor Move Data to Question Move Question to Data Backup Replication management Vertical scaling Horizontal scaling Capacity after demand Capacity ahead of demand DR COOP Size to peak load Dynamic/elastic provisioning Tape SAN SAN Disk Disk SSD RAM limited Peta-scale RAM
  • 58.
    New Computing Architectures • Data close to compute •  Power at the edge •  Optical Computing/Optical Bus •  End of the motherboard—shared pools of everything •  Software defined everything—compute, storage, networking, data center •  Network is the bottleneck and constraint
  • 59.
    Context, Context, Context 59
  • 60.
    Everything in YourFrame of Reference •  Widgets—Webtop in context to business •  Schema on Read—Data in context to your question •  User assembled analytics—answers in context to your questions •  Elastic computing—computing in context to your demand
  • 61.
  • 62.
    High Noon in the Information Age 62
  • 63.
    It is nearlywithin our grasp to compute on all human generated information 63
  • 64.
    FaceBook > 1 billion users > 35% of all photographs 64
  • 65.
    The inanimate israpidly becoming sentient Smarter Planet Cars drive themselves Machines know your needs 65
  • 66.
    3 Wave of rd Computing Cognitive Machines Watson 66
  • 67.
    + + + + + = Moving faster than government can keep up The legal system is woefully behind What are your rights? Who owns your data? Driving the pace of social change Exponentially increasing cyber threats 67
  • 68.