2. Beyond Big Data
Riding the
Technology Wave
Ira A. (Gus) Hunt
!
Chief Technology Officer
3. Our Mission
We are the nation's first line of defense. We accomplish
what others cannot accomplish and go where others
cannot go. We carry out our mission by:
Collecting information that reveals the plans, intentions and
capabilities of our adversaries and provides the basis for
decision and action.
Producing timely analysis that provides insight, warning and
opportunity to the President and decisionmakers charged with
protecting and advancing America's interests.
Conducting covert action at the direction of the President to
preempt threats or achieve US policy objectives.
4. 4 Big Bets
1
Revolutionize Big Data Exploitation
– Acquire, federate, secure and exploit. Grow the haystack, magnify the needles.
2
Accelerate Operational Excellence
– Innovate IT operations and run IT like a business.
3
Serve CIA by supporting the IC
– Assume a leadership role in IC activities that matter to CIA; Build to share
4
Drive Performance through Talent Management
– Focus on continuous learning and diversity of thought, experience, background
5. 6 Key Technology Enablers
0
Secure Mobility
– Immediate, secure and appropriate access to people, data and tools from anywhere at anytime
1
Advanced Mission Analytics—Analytics as a Service
– World-class abilities to discover patterns, correlate information, understand plans and intentions,
and find and identify operational targets in a sea of data. Big Data analytics as a service
2
Enterprise Widgets and Services
– A customizable, integrated and adaptive webtop that lets analysts, ops officers, and
targeters to “have it their way”. Personalization in context.
3
Security as a Service
– One environment, all data, protected and secure.--ubiquitous encryption, enterprise
authentication, audit, DRM, secure ID propagation, and Gold Version C&A.
4
Data Harbor—Data as a Service
– An ultra-high performance data environment that enables CIA missions to acquire,
federate, and position and securely exploit huge volumes data. Data in context.
5
Cloud Computing—Infrastructure as a Service
– Capacity ahead of demand. Large scale, elastic, commodity hosting, storage, and compute
25. Mobile Sensor Platform
Identity by 3-axis accelerometer
Gender (71%)
Height--tall or short (80%)
Weight--heavy or light (80%)
You by your gait (100%)
Actitracker—Android App
25
36. Impact of Big Data
1
Know what we know
2
Discover the gaps in our knowledge
3
Focus targeting to fill the gaps
4
More effective use of expensive or long lead
collection assets
5
Better global coverage to limit surprise
6
Enhance understanding and improve analysis
38. 4 Rules of Big Data
1
It’s the data…
- Apologies to James Carville
2
Power to the people
- Apologies to the Black Panthers
3
Latency breeds contempt
- Apologies to Aesop
4
Context, context, context
- Apologies to Lord Harold Samuel
40. Data vs Tools—A History
Lesson
• Sophisticated tools without the data are
useless
• Mediocre tools with the data are frustrating
• Analysts will always opt for frustration over
futility, if that is their only option
41. Our Job
1
Leverage the Big Data world
2
Find the Information that Matters
3
Connect the Dots
4
Understand the Plans of our Adversaries
Safeguard our national security
43. Our Problem: Which 5K
1
Don’t know the future value of data
2
We cannot connect dots we don’t have
3
Traditional, requirements driven, collection
fails in the Big Data world
- Can’t task for data you don’t know you do need
- The few cannot know the needs of the many
- Global Coverage requires Global Data
44. Characteristics of Big Data
1
More is always better
2
Signal to noise only gets worse
3
Enumeration not modeling
4
Requirements are usually hindsight
45. Data as a Service
• Analysts and operators are not data engineers
• Need insight and understanding
• Ask a question and get a coherent answer
• Cannot know what data sets contain
information of value to them
• Imbue data services and tools with those
smarts
• Smart Data, smart tools, smarter intelligence
45
47. Today
• Analytics and tools are hard to use
• Specialists are required to derive value
• Skilled people are in short supply
• Algorithms are dense and arcane
• Require a lot of hand curation
• Built for business not for intelligence
47
48. New Fields of
Expertise
Data Scientist
Information Engineer
48
49. Data Science
* Data science combines elements from many
fields:
Math
Statistics
Data Engineering
Pattern Recognition and Learning
Advanced Computing
Visualization
Uncertainty Modeling
Data Warehousing
High performance computing
* Wikipedia
50. Big Data Democracy Wins
The power of big data
can only be fully realized
when it is in the hands
of the average user
50
51. Tomorrow
• Elegant, powerful and easy to use tools and
visualizations
• Machines to do more of the heavy lifting
• Intelligent systems that learn from the user
• Correlation not search
• “Curiosity layer”– machines that are curious on
your behalf
52. 7 Universal Constructs for
Analytics
People Events
Places Concepts
Organizations Things
Time
52
54. Keep it Simple
• Data Scientists focus on hard problems
• Build reusable components that anyone
can apply—Recipes
• Share them widely—Apps Store/Apps Mall
—Recipe Book
• Let users assemble components their way
• Experiment and fail quickly to succeed faster
56. Its All About Speed
• Hadoop/Map Reduce—batch
• Flexible, powerful, slow
• Equivalent of Real-Time Map/Reduce
• Flexible, powerful and fast
• Demel, Caffeine, Impala, Apache Drill, Spanner…
• Recursive Streams processing w/
complex analytics
• In-memory—peta-scale RAM architectures
• Distributed, in-memory analytics
57. Tectonic Technology Shifts
Traditional Processing Mass Analytics/Big Data
Data on SAN Data at processor
Move Data to Question Move Question to Data
Backup Replication management
Vertical scaling Horizontal scaling
Capacity after demand Capacity ahead of demand
DR COOP
Size to peak load Dynamic/elastic provisioning
Tape SAN
SAN Disk
Disk SSD
RAM limited Peta-scale RAM
58. New Computing Architectures
• Data close to compute
• Power at the edge
• Optical Computing/Optical Bus
• End of the motherboard—shared pools of
everything
• Software defined everything—compute,
storage, networking, data center
• Network is the bottleneck and constraint
60. Everything in Your Frame of
Reference
• Widgets—Webtop in context to business
• Schema on Read—Data in context to your
question
• User assembled analytics—answers in
context to your questions
• Elastic computing—computing in context
to your demand
63. It is nearly within our grasp
to compute on all human
generated information
63
64. FaceBook
> 1 billion users
> 35% of all photographs
64
65. The inanimate is rapidly
becoming sentient
Smarter Planet
Cars drive themselves
Machines know your needs
65
66. 3 Wave of
rd
Computing
Cognitive Machines
Watson
66
67. + + +
+ + =
Moving faster than government can keep up
The legal system is woefully behind
What are your rights? Who owns your data?
Driving the pace of social change
Exponentially increasing cyber threats 67