Big Data @ Land O Lakes:
Digital Command Center
Dwayne Beberg
Chakra Sankaraiah
Land O’Lakes Overview
Big Data @ Land O’Lakes
Digital Command Center (DCC)
Questions
Why HDFS & Hive
Why Elasticsearch
Why Nifi
Agenda
Land O’Lakes Overview
Big Data @ Land O’Lakes
Digital Command Center (DCC)
Questions
Why HDFS & Hive
Why Elasticsearch
Why Nifi
Purina
Animal nutrition
and feed
We operate four diversified agribusinesses,
driven by insights & innovation
WinField United
Ag services,
crop inputs,
precision agriculture
Land O’Lakes
Dairy foods and
ingredients
Land O’Lakes
Sustain
Sustainability
Fortune 500
#215 50 states
10,000
E MPLOYE E S
U.S. member-owned Cooperative
50+ countries
3rd LARGEST
LO C AT I O N S
Land O’Lakes, Inc. Overview
As one of the largest farmer-owned cooperatives,
Land O’Lakes has a broad reach into American agriculture
300,000+
agricultural producers
3,825
Total
Membership
Acres of
farmland
60M
Serving
with
Farmer-owned
International Development
Branded Goods
Marketing
& Sales
B2B / Industrial
Marketing
& Sales
R&D /
ManufacturingPrimary
Processing
Milk
QualityAnimal
Nutrition
Crop Inputs
& Insights
Seed
Production Consumer
Our unique farm-to-fork business model enables us to
see things differently
Land O’Lakes Overview
Big Data @ Land O’Lakes
Digital Command Center (DCC)
Questions
Why HDFS & Hive
Why Elasticsearch
Why Nifi
Data Driven Vision
Data Architecture Big Data and Advanced Analytics Data Visualization Data Engineering
Roles &
Responsibilities
Data Architects have strong domain
expertise to discover, design, and
develop trusted data driven solutions for
visualization, research, and data science
activities
Data Scientists use Machine Learning &
advanced linear and non-linear
mathematical techniques to model business
problems and turns data into business
insights
Data Visualization Designers create
interactive visualizations and
dashboards that provide actionable
insights supporting operational and
self-serve analytics
Data acquisition, data integration, data
flow, and data transformation
Vertical Lead Dwayne Beberg Chakra Sankaraiah Rich Bellefeuille Joel Lipetzky
Technologies /
Open Source
Projects
EDL = Hive, HBase, Elastic, Atlas
EDW = relational and columnar
Other = EDQ, DRM, and ERwin
EDL = Python, Zeppelin, Jupyter, Machine
Learning, H2O.ai, Watson APIs
Other = Endeca Information Discovery
Operational = OBIEE
Self-Serve = MS Power BI
Real-Time = SparkR, Kibana
Other = R, D3 (applied within apps)
Batch = ODI, Informatica
Stream = NiFi, Spark Streaming, Kafka
IoT = Event/IoT Hub, OSISoft – PI
EAI = Fusion Middleware, API Mgr
EDI = E2Open, Sterling Integrator
Current Projects
Pricing Optimization
Integrated Marketing Analytics
Demand Signal Management
WinField United Advanced Analytics
Purina R&D Big Data Analytics
Manufacturing Analytics
WinField United Embedded BI
Dairy Foods BI Trade Promotions
Sustain AI & Answer Plot
WinField United ERP
Manufacturing Analytics
Foodservice CRM
Strategic Organization – Data Driven Verticals
Strategic Technologies
Utilizing Advanced Analytics Leads to Three
Categories of Business Opportunities:
Discovering hidden insights
• Infinity Insurance — It text-mined years of adjuster reports to identify key indicators of
fraudulent claims. It reduced fraud 75%, and eliminated marketing to customers with a
higher likelihood of submitting false claims.
Making better-informed decisions
• Agco — It conducted pattern analysis of thousands of configuration options for farm
machinery to determine optimal base configurations and real-time customer demand. It
reduced product variety by 61% and slashed days of inventory 81%, while maintaining
service levels.
Automating business processes
• McDonald's — Bakery operation photo-analyzes over 1,000 buns per minute for color,
shape and seed distribution to continually adjust ovens and other equipment. It saves
thousands of pounds of wasted product, speeds production and saves energy.
Estimated Effort:
Transformation
--- Maturity 
OptimizationExplorationAwareness
---MaturityAdvanced Analytics – Use Case Maturity Roadmap
WinField
United
Dairy
Foods
No Use Case Name
1 Pricing Optimization Project
2 Purinalytics
3 Supply Chain Optimization Project
4 Equinox
5 Marketing Analytics
6 Prospex
7 Purina Animal Nutrition (PANDA)
8 Butter/Power Profitability
9 Image recognition
10 AnswerPlot Trial Optimization
11 Operator 360O
12 Spend Optimization
13 Deduction Management
14 Demand Sensing
15 Product Performance Predictive Modeling
16 Drone Challenge
17 Sustainability Scorecard
18 TOP SECRET 1
19 TOP SECRET 2
20 TOP SECRET 3
Purina
7
1
9
10
8
4
4
Corporate
22
L
11
3
6
18
2
M
H
Low
Medium
High
Size = Value
Sustainability
11
12
13
17
16
15
14
5
TransformationOptimizationExplorationAwareness
Land O’Lakes Overview
Big Data @ Land O’Lakes
Digital Command Center (DCC)
Questions
Why HDFS & Hive
Why Elasticsearch
Why Nifi
Digital Command Center Project
Objective: Showcase and amplify visualization of complex digital data to increase awareness and perceptions of Digital Marketing
innovation.
Digital Command Center Project:
Launch a Digital Command Center in the new AH 4th floor workspace. The DCC is large wall featuring multiple touch screen
monitors that will display digital marketing data around four key topics
• Social Media touchpoints includes Facebook, Twitter, Pinterest & Instagram helps us with social media impressions
• Click Stream Information for our websites helps us understand which campaigns are driving online impressions.
• Internet Search Results for our LOL terms and products helps us to do SEO
• Nerve Center shows network charts around topic of interest that can lead to new product innovation
Reference Architecture for Marketing Analytics
Industry Marketing Analytics Platforms
Campaign
Mgmt
Social
Channels
Adobe Analytics
Google AnalyticsMobile
Web
Clickstream
Marketing
channel
Integrated MarketingAnalytics
(Data Lake)
Syndicate
data
Enterprise Data Warehouse
ERP
CRM
SCM
POS
PowerBI
Kabana
Tableau
Digital
Command
Center
Datorama/
Origami
Logic/
Beckon
Budgeting
System Spark
R
Search Data
Network models
Social
Channels
Social Data
Click Stream
Click Stream
Website
Interactions
Campaign
Email
Budget &
Forecast
SEO
SERP
External
Social Data
Web Reports
Custom Interactive
Reports
D3.js Visuals
D3.js Visuals
Integration
Marketing
Enterprise
Sales Platform
Marketing
Platform
Partner Platform
Consumer
Platform
DCC Architecture
EDL (Data Science &
Discovery)
EDL (Data Transformation &
Storage)
EDL (Data Ingestion)
NoSQL
Storage
Hadoop
Storage
Index
Storage
Elastic
Search
(Operations, Security & Governance)
Documents
, Emails
Web Logs,
Click Streams
Social
Networks
Machine
logs
Sensor
Data &
IoT
Geolocation
Data
Data
Sources
Data
Science
Discovery
& Access
Real Time
Batch
Logstash
OLTP,
ERP, CRM
Kerberos
Data
Access
Email
Mobile App
Apps & Websites
Notification
Reports &
Dashboard
DCC Solution Components
Ingestion: We built data flow for clickstream, social media and google search using Apache Nifi.
These data flows loads the data to multiple destination within a single flow.
Kibana
Elasticsearch Discovery & API access: Elastic search storage was used to provide sub second response via
API calls & perform data discovery using kibana. We also exposed the elastic search index as a
hive table so that it can be accessed easily via BI tools.
Engagement Layer: We used crowd sourcing to come up with a elegant visualization and
implemented that using JS and D3. Visualization dashboard used elastic search index as
backed that were assessed via API call.
Store & Analyze: We used HDFS for raw storage and used hive external & internal tables to
perform adhoc queries using tableau/Power BI. Raw storage allowed us to pull any additional
attributes that may be used in furture. Hive allowed us to have performance optimized and a
cleaner data model, for adhoc super users to consume.
Land O’Lakes Overview
Big Data @ Land O’Lakes
Digital Command Center (DCC)
Questions
Why HDFS & Hive
Why Elasticsearch
Why Nifi
1. Multiple flow types in single tool
Flow Type 1:
History Load
Flow Type 3:
Real Time
Flow Type 2:
Nightly Batch
2. Multiple Configuration & Routing
Multiple API calls based on BUs
Same API calls but with different
parameter for 4 different BUs, 4 different
Platforms (facebook, twitter, Instagram,
pinterest), 3 different metrics. Total of 48
calls in couple of processors
3. Custom processor makes code easy to reuse
Write reusable custom code in Java and
make them as nifi processor. From there
on you can drag , drop and reuse.
Schedule a token refresh at
any required interview and
generate reusable token
file for all processors.
4. Real time advanced analytics
We get real time sentiment information
from open source NLP (Natural
Language Processing) API
We can use any of the AI based APIs
that are available from IBM Watson,
Microsoft cognitive service or Google
machine learning API and plug them in
our overall data flow within Nifi. This is
sample of what googles vision API
returns for some of the images.
5. Looping through multiple pages of data
Pagination is a common thing when we
make API calls to application such as
NOAA APIs, Google analytics etc..
We handled that using a loop within Nifi
that goes through the various limit and
offset so to ensure that it goes through
all the values in a micro batch fashion.
6. Fetch once & load to multiple destination
Stream to hive as the records
come in for instance access to
BI tools
Store all attributes in its rawest
format for broader discovery
Send to other systems who need
the same data instantaneously
Sub-second API access to data
and able to provide kibana
based data discovery
7. Handle data drift
Day 1
App1 data extract
Vendor_name
Vendor_address
Vendor_csv source file
Vendor_id
Vendor_name
Vendor_address
App1 Hive table
Vendor_id
Vendor_address
Day 7
App1 data extract
Vendor_name
Vendor_address
Vendor_csv source file
Vendor_id
Vendor_name
Vendor_parent_name
Vendor_address
App1 Hive table
Vendor_id
Vendor_address
Schema Change
You Nifi InferAvroSchema &
ConvertCSVToAvro to manage
schema change.
8. Nifi from Hortonworks Data Flow is enterprise grade
Scalable
Distributed architecture at
the same time minifi
provides edge capability
Operations &
Monitoring
Ambari along with Nifi GUI
provides a gret
administration
Security
SSL, LDAP authentication
and Ranger authorization
Provenance & Audit
Provide fine grain lineage.
Also tracks each record and
each change using its audit
mechanism
Easy to Develop &
Maintain
GUI driven development &
Maintenance
Land O’Lakes Overview
Big Data @ Land O’Lakes
Digital Command Center (DCC)
Questions
Why HDFS & Hive
Why Elasticsearch
Why Nifi
1. Store raw form and have external tables
Approx. 900 Attributes
Approx. 200 Attributes
select get_json_object(campaign_table.rawjson,
'$.campaign_section_id') as campaign_section_id,
get_json_object(campaign_table.rawjson, '$.campaign_id') as
campaign_id,
get_json_object(campaign_table.rawjson, '$.title') as title
from campaign_table;
CREATE EXTERNAL TABLE if not exists campaign_table (
rawjson string
)
LOCATION 'XXXXX/campaign_data/';
Example:
2. We have to bring it all together via a data model
3. Ability to use Elastic Index in hive and vise-versa
This jar enable interaction between elastic & Hadoop
elasticsearch-hadoop-2.4.4.jar Sample Hive table:
CREATE EXTERNAL TABLE `noaa`(
`elevation` string COMMENT 'from deserializer',
`station_name` string COMMENT 'from deserializer',
`mindate` string COMMENT 'from deserializer',
`maxdate` string COMMENT 'from deserializer',
`station_id` string COMMENT 'from deserializer',
`location` string COMMENT 'from deserializer',
`elevationunit` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY
'org.elasticsearch.hadoop.hive.EsStorageHandler'
WITH SERDEPROPERTIES (
'serialization.format'='1')
LOCATION
'hdfs://XXXXXX/noaa'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true"}',
'es.nodes'=elasticserver',
'es.query'='?q=*',
'es.resource'='noaa_stations_data',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1490821856')
Land O’Lakes Overview
Big Data @ Land O’Lakes
Digital Command Center (DCC)
Questions
Why HDFS & Hive
Why Elasticsearch
Why Nifi
1. Sub-second API response time for on the fly aggregates
2. Data discovery with Kibana
3. Real time dashboard capability with Kibana
Land O’Lakes Overview
Big Data @ Land O’Lakes
Digital Command Center (DCC)
Questions
Why HDFS & Hive
Why Elasticsearch
Why Nifi
csankaraiah@landolakes.com
dmbeberg@landolakes.com
Questions?
THANK YOU

Marketing Digital Command Center

  • 1.
    Big Data @Land O Lakes: Digital Command Center Dwayne Beberg Chakra Sankaraiah
  • 2.
    Land O’Lakes Overview BigData @ Land O’Lakes Digital Command Center (DCC) Questions Why HDFS & Hive Why Elasticsearch Why Nifi Agenda
  • 3.
    Land O’Lakes Overview BigData @ Land O’Lakes Digital Command Center (DCC) Questions Why HDFS & Hive Why Elasticsearch Why Nifi
  • 4.
    Purina Animal nutrition and feed Weoperate four diversified agribusinesses, driven by insights & innovation WinField United Ag services, crop inputs, precision agriculture Land O’Lakes Dairy foods and ingredients Land O’Lakes Sustain Sustainability
  • 5.
    Fortune 500 #215 50states 10,000 E MPLOYE E S U.S. member-owned Cooperative 50+ countries 3rd LARGEST LO C AT I O N S Land O’Lakes, Inc. Overview
  • 6.
    As one ofthe largest farmer-owned cooperatives, Land O’Lakes has a broad reach into American agriculture 300,000+ agricultural producers 3,825 Total Membership Acres of farmland 60M Serving with
  • 7.
    Farmer-owned International Development Branded Goods Marketing &Sales B2B / Industrial Marketing & Sales R&D / ManufacturingPrimary Processing Milk QualityAnimal Nutrition Crop Inputs & Insights Seed Production Consumer Our unique farm-to-fork business model enables us to see things differently
  • 8.
    Land O’Lakes Overview BigData @ Land O’Lakes Digital Command Center (DCC) Questions Why HDFS & Hive Why Elasticsearch Why Nifi
  • 9.
  • 10.
    Data Architecture BigData and Advanced Analytics Data Visualization Data Engineering Roles & Responsibilities Data Architects have strong domain expertise to discover, design, and develop trusted data driven solutions for visualization, research, and data science activities Data Scientists use Machine Learning & advanced linear and non-linear mathematical techniques to model business problems and turns data into business insights Data Visualization Designers create interactive visualizations and dashboards that provide actionable insights supporting operational and self-serve analytics Data acquisition, data integration, data flow, and data transformation Vertical Lead Dwayne Beberg Chakra Sankaraiah Rich Bellefeuille Joel Lipetzky Technologies / Open Source Projects EDL = Hive, HBase, Elastic, Atlas EDW = relational and columnar Other = EDQ, DRM, and ERwin EDL = Python, Zeppelin, Jupyter, Machine Learning, H2O.ai, Watson APIs Other = Endeca Information Discovery Operational = OBIEE Self-Serve = MS Power BI Real-Time = SparkR, Kibana Other = R, D3 (applied within apps) Batch = ODI, Informatica Stream = NiFi, Spark Streaming, Kafka IoT = Event/IoT Hub, OSISoft – PI EAI = Fusion Middleware, API Mgr EDI = E2Open, Sterling Integrator Current Projects Pricing Optimization Integrated Marketing Analytics Demand Signal Management WinField United Advanced Analytics Purina R&D Big Data Analytics Manufacturing Analytics WinField United Embedded BI Dairy Foods BI Trade Promotions Sustain AI & Answer Plot WinField United ERP Manufacturing Analytics Foodservice CRM Strategic Organization – Data Driven Verticals
  • 11.
  • 12.
    Utilizing Advanced AnalyticsLeads to Three Categories of Business Opportunities: Discovering hidden insights • Infinity Insurance — It text-mined years of adjuster reports to identify key indicators of fraudulent claims. It reduced fraud 75%, and eliminated marketing to customers with a higher likelihood of submitting false claims. Making better-informed decisions • Agco — It conducted pattern analysis of thousands of configuration options for farm machinery to determine optimal base configurations and real-time customer demand. It reduced product variety by 61% and slashed days of inventory 81%, while maintaining service levels. Automating business processes • McDonald's — Bakery operation photo-analyzes over 1,000 buns per minute for color, shape and seed distribution to continually adjust ovens and other equipment. It saves thousands of pounds of wasted product, speeds production and saves energy.
  • 13.
    Estimated Effort: Transformation --- Maturity OptimizationExplorationAwareness ---MaturityAdvanced Analytics – Use Case Maturity Roadmap WinField United Dairy Foods No Use Case Name 1 Pricing Optimization Project 2 Purinalytics 3 Supply Chain Optimization Project 4 Equinox 5 Marketing Analytics 6 Prospex 7 Purina Animal Nutrition (PANDA) 8 Butter/Power Profitability 9 Image recognition 10 AnswerPlot Trial Optimization 11 Operator 360O 12 Spend Optimization 13 Deduction Management 14 Demand Sensing 15 Product Performance Predictive Modeling 16 Drone Challenge 17 Sustainability Scorecard 18 TOP SECRET 1 19 TOP SECRET 2 20 TOP SECRET 3 Purina 7 1 9 10 8 4 4 Corporate 22 L 11 3 6 18 2 M H Low Medium High Size = Value Sustainability 11 12 13 17 16 15 14 5 TransformationOptimizationExplorationAwareness
  • 14.
    Land O’Lakes Overview BigData @ Land O’Lakes Digital Command Center (DCC) Questions Why HDFS & Hive Why Elasticsearch Why Nifi
  • 15.
    Digital Command CenterProject Objective: Showcase and amplify visualization of complex digital data to increase awareness and perceptions of Digital Marketing innovation. Digital Command Center Project: Launch a Digital Command Center in the new AH 4th floor workspace. The DCC is large wall featuring multiple touch screen monitors that will display digital marketing data around four key topics • Social Media touchpoints includes Facebook, Twitter, Pinterest & Instagram helps us with social media impressions • Click Stream Information for our websites helps us understand which campaigns are driving online impressions. • Internet Search Results for our LOL terms and products helps us to do SEO • Nerve Center shows network charts around topic of interest that can lead to new product innovation
  • 16.
    Reference Architecture forMarketing Analytics Industry Marketing Analytics Platforms Campaign Mgmt Social Channels Adobe Analytics Google AnalyticsMobile Web Clickstream Marketing channel Integrated MarketingAnalytics (Data Lake) Syndicate data Enterprise Data Warehouse ERP CRM SCM POS PowerBI Kabana Tableau Digital Command Center Datorama/ Origami Logic/ Beckon Budgeting System Spark R Search Data Network models Social Channels Social Data Click Stream Click Stream Website Interactions Campaign Email Budget & Forecast SEO SERP External Social Data Web Reports Custom Interactive Reports D3.js Visuals D3.js Visuals Integration Marketing Enterprise Sales Platform Marketing Platform Partner Platform Consumer Platform
  • 17.
    DCC Architecture EDL (DataScience & Discovery) EDL (Data Transformation & Storage) EDL (Data Ingestion) NoSQL Storage Hadoop Storage Index Storage Elastic Search (Operations, Security & Governance) Documents , Emails Web Logs, Click Streams Social Networks Machine logs Sensor Data & IoT Geolocation Data Data Sources Data Science Discovery & Access Real Time Batch Logstash OLTP, ERP, CRM Kerberos Data Access Email Mobile App Apps & Websites Notification Reports & Dashboard
  • 18.
    DCC Solution Components Ingestion:We built data flow for clickstream, social media and google search using Apache Nifi. These data flows loads the data to multiple destination within a single flow. Kibana Elasticsearch Discovery & API access: Elastic search storage was used to provide sub second response via API calls & perform data discovery using kibana. We also exposed the elastic search index as a hive table so that it can be accessed easily via BI tools. Engagement Layer: We used crowd sourcing to come up with a elegant visualization and implemented that using JS and D3. Visualization dashboard used elastic search index as backed that were assessed via API call. Store & Analyze: We used HDFS for raw storage and used hive external & internal tables to perform adhoc queries using tableau/Power BI. Raw storage allowed us to pull any additional attributes that may be used in furture. Hive allowed us to have performance optimized and a cleaner data model, for adhoc super users to consume.
  • 19.
    Land O’Lakes Overview BigData @ Land O’Lakes Digital Command Center (DCC) Questions Why HDFS & Hive Why Elasticsearch Why Nifi
  • 20.
    1. Multiple flowtypes in single tool Flow Type 1: History Load Flow Type 3: Real Time Flow Type 2: Nightly Batch
  • 21.
    2. Multiple Configuration& Routing Multiple API calls based on BUs Same API calls but with different parameter for 4 different BUs, 4 different Platforms (facebook, twitter, Instagram, pinterest), 3 different metrics. Total of 48 calls in couple of processors
  • 22.
    3. Custom processormakes code easy to reuse Write reusable custom code in Java and make them as nifi processor. From there on you can drag , drop and reuse. Schedule a token refresh at any required interview and generate reusable token file for all processors.
  • 23.
    4. Real timeadvanced analytics We get real time sentiment information from open source NLP (Natural Language Processing) API We can use any of the AI based APIs that are available from IBM Watson, Microsoft cognitive service or Google machine learning API and plug them in our overall data flow within Nifi. This is sample of what googles vision API returns for some of the images.
  • 24.
    5. Looping throughmultiple pages of data Pagination is a common thing when we make API calls to application such as NOAA APIs, Google analytics etc.. We handled that using a loop within Nifi that goes through the various limit and offset so to ensure that it goes through all the values in a micro batch fashion.
  • 25.
    6. Fetch once& load to multiple destination Stream to hive as the records come in for instance access to BI tools Store all attributes in its rawest format for broader discovery Send to other systems who need the same data instantaneously Sub-second API access to data and able to provide kibana based data discovery
  • 26.
    7. Handle datadrift Day 1 App1 data extract Vendor_name Vendor_address Vendor_csv source file Vendor_id Vendor_name Vendor_address App1 Hive table Vendor_id Vendor_address Day 7 App1 data extract Vendor_name Vendor_address Vendor_csv source file Vendor_id Vendor_name Vendor_parent_name Vendor_address App1 Hive table Vendor_id Vendor_address Schema Change You Nifi InferAvroSchema & ConvertCSVToAvro to manage schema change.
  • 27.
    8. Nifi fromHortonworks Data Flow is enterprise grade Scalable Distributed architecture at the same time minifi provides edge capability Operations & Monitoring Ambari along with Nifi GUI provides a gret administration Security SSL, LDAP authentication and Ranger authorization Provenance & Audit Provide fine grain lineage. Also tracks each record and each change using its audit mechanism Easy to Develop & Maintain GUI driven development & Maintenance
  • 28.
    Land O’Lakes Overview BigData @ Land O’Lakes Digital Command Center (DCC) Questions Why HDFS & Hive Why Elasticsearch Why Nifi
  • 29.
    1. Store rawform and have external tables Approx. 900 Attributes Approx. 200 Attributes select get_json_object(campaign_table.rawjson, '$.campaign_section_id') as campaign_section_id, get_json_object(campaign_table.rawjson, '$.campaign_id') as campaign_id, get_json_object(campaign_table.rawjson, '$.title') as title from campaign_table; CREATE EXTERNAL TABLE if not exists campaign_table ( rawjson string ) LOCATION 'XXXXX/campaign_data/'; Example:
  • 30.
    2. We haveto bring it all together via a data model
  • 31.
    3. Ability touse Elastic Index in hive and vise-versa This jar enable interaction between elastic & Hadoop elasticsearch-hadoop-2.4.4.jar Sample Hive table: CREATE EXTERNAL TABLE `noaa`( `elevation` string COMMENT 'from deserializer', `station_name` string COMMENT 'from deserializer', `mindate` string COMMENT 'from deserializer', `maxdate` string COMMENT 'from deserializer', `station_id` string COMMENT 'from deserializer', `location` string COMMENT 'from deserializer', `elevationunit` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe' STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format'='1') LOCATION 'hdfs://XXXXXX/noaa' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true"}', 'es.nodes'=elasticserver', 'es.query'='?q=*', 'es.resource'='noaa_stations_data', 'numFiles'='0', 'numRows'='0', 'rawDataSize'='0', 'totalSize'='0', 'transient_lastDdlTime'='1490821856')
  • 32.
    Land O’Lakes Overview BigData @ Land O’Lakes Digital Command Center (DCC) Questions Why HDFS & Hive Why Elasticsearch Why Nifi
  • 33.
    1. Sub-second APIresponse time for on the fly aggregates
  • 34.
    2. Data discoverywith Kibana
  • 35.
    3. Real timedashboard capability with Kibana
  • 36.
    Land O’Lakes Overview BigData @ Land O’Lakes Digital Command Center (DCC) Questions Why HDFS & Hive Why Elasticsearch Why Nifi
  • 37.
  • 38.