1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modern Application
Architecture using
Data.gov
Devin Pinkston | Solutions Engineer Ian Brooks | Solutions Engineer
Henry Sowell| Technical Director
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HORTONWORKS DATA PLATFORM
Hadoop
&YARN
DATA MGMT DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS SECURITY
HDP 2.2
Dec 2014
HDP 2.2
Dec 2014
2.2.0
2.4.0
2.6.0
2.7.1
HDP 2.3
Oct 2015
2.7.3
HDP 2.6*
1H2017
2.7.1
HDP 2.4
Mar 2016
* HDP 2.6 – Shows current Apache branches being used. Final component version subject to change based on Apache release process.
** Spark 1.6.3+ Spark 2.1 – HDP 2.6 supports both Spark 1.6.3 and Spark 2.1 as GA.
*** Hive 2.1 is GA within HDP 2.6.
**** Apache Solr is available as an add-on product HDP Search.
2.7.3
Sqoop
1.4.4
1.4.5
1.4.4
1.4.6
1.4.6
1.4.6
1.4.6
Druid
0.9.2
Knox
0.4.0
0.5.0
0.6.0
0.11.0
0.6.0
0.9.0
Ranger
0.4.0
0.5.0
0.7.0
0.5.0
0.6.0
Ambari
1.4.4
2.0.0
1.5.1
2.1.0
2.5.0
2.2.1
2.4.0
Kafka
0.8.2
0.8.1
0.10.1.0
0.9.0
0.10.0
Zookeeper
3.4.5
3.4.6
3.4.5
3.4.6
3.4.6
3.4.6
3.4.6
Flume
1.5.2
1.4.0
1.3.1
1.5.2
1.5.2
1.5.2
1.5.2
Solr
4.10.2
4.7.2
5.2.1
5.5.1
****
5.2.1
5.5.1
Slider
0.60.0
0.80.0
0.91.0
0.80.0
0.91.0
Atlas
0.5.0
0.8.0
0.5.0
0.7.0
Accumulo
1.6.1
1.5.1
1.7.0
1.7.0
1.7.0
1.7.0
Phoenix
4.0.0
4.2.0
4.4.0
4.7.0
4.4.0
4.7.0
Storm
0.9.3
0.10.0
0.9.1
1.1.0
0.10.0
1.0.1
Falcon
0.5.0
0.6.0
0.6.1
0.10.0
0.6.1
0.10.0
Tez
0.4.0
0.5.2
0.7.0
0.7.0
0.7.0
0.7.0
Hive
0.12.0
0.13.0
0.14.0
1.2.1
1.2.1+
2.1***
1.2.1
1.2.1+
2.1***
Pig
0.12.0
0.12.1
0.14.0
0.15.0
0.16.0
0.15.0
0.16.0
HDP 2.5
Aug 2016
Oozie
3.3.2
4.1.0
4.0.0
4.2.0
4.2.0
4.2.0
4.2.0
Spark
1.2.1
1.4.1
1.6.3+
2.1**
1.6.0
1.6.2+
2.0**
HBase
0.98.4
0.96.1
0.98.0
1.1.2
1.1.2
1.1.2
1.1.2
Zeppelin
0.7.0
0.6.0
HDP 2.1
April 2014
HDP 2.0
Oct 2013
Ongoing Innovation in Apache
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
How Do We Handle The Data?
Actionable
Intelligence from
Connected Data
Platforms
Capturing perishable
insights from data in motion
Ensuring rich, historical insights on
data at rest
Necessary for modern data
applications
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Single Aggregation Site to Hadoop Cluster Architecture
Core Hadoop Cluster
HCatalog: Shared Table & User Defined Metadata for All Workloads
Ambari: Provision, Manage and Monitor Cluster Resources
Stream
Data Access
° ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° ° ° ° °
° ° ° ° °
1 ° ° ° ° ° ° ° ° ° ° ° ° ° °
HDFS (Hadoop Distributed File System)
YARN (Cluster Resource Management)
Accumulo
Collection Sources
NiFi
NiFi Put Ingest to
Apache Accumulo
NiFi
NiFi
NiFi
Aggregation Site
NiFi
NiFi
NiFi
NiFi
Kafka
Movement Across
Networks
Storm
Content-based
routing/enrichment
Incident
REST API
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Basic Accumulo Incident Structure
Row <incident_epoch>
Column Family “Incident” “Geo” ”temporal”
Column Qualifier <cat>, <descript>,
<pdDistrict>,
<incidentNum>, <Pdid>
<address>, <x>, <y>,
<location>
<dayOfWeek>, <date>,
<time>
Value <value> <value> <value>

Enabling Modern Application Architecture using Data.gov open government data

  • 1.
    1 © HortonworksInc. 2011 – 2017. All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Modern Application Architecture using Data.gov Devin Pinkston | Solutions Engineer Ian Brooks | Solutions Engineer Henry Sowell| Technical Director
  • 2.
    2 © HortonworksInc. 2011 – 2017. All Rights Reserved HORTONWORKS DATA PLATFORM Hadoop &YARN DATA MGMT DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS SECURITY HDP 2.2 Dec 2014 HDP 2.2 Dec 2014 2.2.0 2.4.0 2.6.0 2.7.1 HDP 2.3 Oct 2015 2.7.3 HDP 2.6* 1H2017 2.7.1 HDP 2.4 Mar 2016 * HDP 2.6 – Shows current Apache branches being used. Final component version subject to change based on Apache release process. ** Spark 1.6.3+ Spark 2.1 – HDP 2.6 supports both Spark 1.6.3 and Spark 2.1 as GA. *** Hive 2.1 is GA within HDP 2.6. **** Apache Solr is available as an add-on product HDP Search. 2.7.3 Sqoop 1.4.4 1.4.5 1.4.4 1.4.6 1.4.6 1.4.6 1.4.6 Druid 0.9.2 Knox 0.4.0 0.5.0 0.6.0 0.11.0 0.6.0 0.9.0 Ranger 0.4.0 0.5.0 0.7.0 0.5.0 0.6.0 Ambari 1.4.4 2.0.0 1.5.1 2.1.0 2.5.0 2.2.1 2.4.0 Kafka 0.8.2 0.8.1 0.10.1.0 0.9.0 0.10.0 Zookeeper 3.4.5 3.4.6 3.4.5 3.4.6 3.4.6 3.4.6 3.4.6 Flume 1.5.2 1.4.0 1.3.1 1.5.2 1.5.2 1.5.2 1.5.2 Solr 4.10.2 4.7.2 5.2.1 5.5.1 **** 5.2.1 5.5.1 Slider 0.60.0 0.80.0 0.91.0 0.80.0 0.91.0 Atlas 0.5.0 0.8.0 0.5.0 0.7.0 Accumulo 1.6.1 1.5.1 1.7.0 1.7.0 1.7.0 1.7.0 Phoenix 4.0.0 4.2.0 4.4.0 4.7.0 4.4.0 4.7.0 Storm 0.9.3 0.10.0 0.9.1 1.1.0 0.10.0 1.0.1 Falcon 0.5.0 0.6.0 0.6.1 0.10.0 0.6.1 0.10.0 Tez 0.4.0 0.5.2 0.7.0 0.7.0 0.7.0 0.7.0 Hive 0.12.0 0.13.0 0.14.0 1.2.1 1.2.1+ 2.1*** 1.2.1 1.2.1+ 2.1*** Pig 0.12.0 0.12.1 0.14.0 0.15.0 0.16.0 0.15.0 0.16.0 HDP 2.5 Aug 2016 Oozie 3.3.2 4.1.0 4.0.0 4.2.0 4.2.0 4.2.0 4.2.0 Spark 1.2.1 1.4.1 1.6.3+ 2.1** 1.6.0 1.6.2+ 2.0** HBase 0.98.4 0.96.1 0.98.0 1.1.2 1.1.2 1.1.2 1.1.2 Zeppelin 0.7.0 0.6.0 HDP 2.1 April 2014 HDP 2.0 Oct 2013 Ongoing Innovation in Apache
  • 3.
    3 © HortonworksInc. 2011 – 2017. All Rights Reserved How Do We Handle The Data? Actionable Intelligence from Connected Data Platforms Capturing perishable insights from data in motion Ensuring rich, historical insights on data at rest Necessary for modern data applications
  • 4.
    4 © HortonworksInc. 2011 – 2017. All Rights Reserved Single Aggregation Site to Hadoop Cluster Architecture Core Hadoop Cluster HCatalog: Shared Table & User Defined Metadata for All Workloads Ambari: Provision, Manage and Monitor Cluster Resources Stream Data Access ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) YARN (Cluster Resource Management) Accumulo Collection Sources NiFi NiFi Put Ingest to Apache Accumulo NiFi NiFi NiFi Aggregation Site NiFi NiFi NiFi NiFi Kafka Movement Across Networks Storm Content-based routing/enrichment Incident REST API
  • 5.
    5 © HortonworksInc. 2011 – 2017. All Rights Reserved Basic Accumulo Incident Structure Row <incident_epoch> Column Family “Incident” “Geo” ”temporal” Column Qualifier <cat>, <descript>, <pdDistrict>, <incidentNum>, <Pdid> <address>, <x>, <y>, <location> <dayOfWeek>, <date>, <time> Value <value> <value> <value>

Editor's Notes

  • #4 Hortonworks: Powering the Future of Data