SlideShare a Scribd company logo
© 2015 MapR Technologies© 2015 MapR Technologies
Exploring Enterprise Networks with Familiar BI Tools
© 2015 MapR Technologies
On the Menu
• Discovery: why Hadoop + BI tools for analyzing networks?
• Network analysis in a BI context
• Apache Drill
• Connecting BI tools to network data
• Practical examples with Drill and BI
– Querying packets with Tableau
– Troubleshooting with SAP Lumira
– Gaining insight into customer experience across multiple sources
– Using built-in Drill features for faster analysis
• Summary, conclusions, more resources
© 2015 MapR Technologies
Topics not covered in detail…
• Packet capture architectures
• Ways to capture packets effectively
• Large-scale packet processing – others have done this
• Comparison of BI tools
• Survey of the best SQL-on-Hadoop technology
© 2015 MapR Technologies
There’s a lot happening in your network…
• Packets, logs, interconnections
• Many layers (L1-L7), “L8”
• Network data is multi-faceted…
– It’s serialized and highly structured
– It facilitates communication between heterogeneous devices via
common protocols
– But it’s not structured to be stored and analyzed
– The application often doesn’t care
– Consequently, specialized tooling and software is required
© 2015 MapR Technologies
Why Hadoop + BI tools?
• What does Hadoop enable that makes it a powerful tool for
network analytics?
• What’s new that wasn’t previous possible/desirable?
• How does it augment existing solutions?
• It’s many things:
– New ways of accessing semi-structured data from the network
– Offloading of existing data warehouses and tools
– Combining, joining, blending network captures with other sources
– Many network tools cannot answer questions about your business and
customers
– You can use SQL to get a lot of the answers you need
© 2015 MapR Technologies
New Data Sources Unlock New Insights & Apps
Existing structured data
• Well-defined and well-
understood schema
– OLTP data
– Data warehouse data
– End user data stores (e.g.,
Excel)
New multi-structured data
• Typically un-modeled,
different in format
– Network data
– Clickstream data
– Sensor data
– Rich media (e.g., audio, video)
– Documents
… both types are needed today for deeper insights
© 2015 MapR Technologies
1980 2000 20101990 2020
Fixed schema
DBA controls structure
Dynamic / Flexible schema
Application controls structure
NON-RELATIONAL DATASTORESRELATIONAL DATABASES
GBs-TBs TBs-PBsVolume
Database
Network data, like other data, is increasingly Stored in Non-
Relational Datastores
Structure
Development
Structured Structured, semi-structured and unstructured
Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
© 2015 MapR Technologies
Apache Drill Brings Flexibility & Performance
Access to any data type, any data source
• Relational
• Nested data
• Schema-less
Rapid time to insights
• Query data in-situ
• No Schemas required
• Easy to get started
Integration with existing tools
• ANSI SQL
• BI tool integration
Scale in all dimensions
• TB-PB of scale
• 1000’s of users
• 1000’s of nodes
Granular Security
• Authentication
• Row/column level controls
• De-centralized
© 2015 MapR Technologies
Granular security permissions through Drill views
Name City State Credit Card #
Dave San Jose CA 1374-7914-3865-4817
John Boulder CO 1374-9735-1794-9711
Raw File (/raw/cards.csv)
Owner
Admins
Permission
Admins
Business Analyst Data Scientist
Name City State Credit Card #
Dave San Jose CA 1374-1111-1111-1111
John Boulder CO 1374-1111-1111-1111
Data Scientist View (/views/maskedcards.csv)
Not a physical data copy
Name City State
Dave San Jose CA
John Boulder CO
Business Analyst View
Owner
Admins
Permission
Business
Analysts
Owner
Admins
Permission
Data
Scientists
© 2015 MapR Technologies
Self-Service Data Exploration
Direct access to Hadoop data from familiar BI / Analytics tools- ANSI SQL compatible
Ad-hoc
Reporting
Queries
Raw Data
Exploration
Day Zero
queries
…
© 2015 MapR Technologies
Drill is a Distributed SQL query engine
drillbit
DataNode/Regi
onServer
drillbit
DataNode/Regi
onServer
drillbit
DataNode/Regi
onServer
ZooKeeper
ZooKeeper
ZooKeeper
…
 Scale out
 Columnar and Vectorized execution
 Optimistic and pipelined execution (no MR, Spark, Tez)
 Late binding
 Extensible
© 2015 MapR Technologies
- Sub-directory
- HBase namespace
- Hive database
Run SQL on Captures Directly
SELECT * FROM dfs.router1.`captures.json`
Workspace
- Pathnames
- Hive table
- HBase table
Table
- DFS (Text, Parquet, JSON)
- HBase/MapR-DB
- Hive Metastore/HCatalog
- Easy API to go beyond Hadoop
Storage plugin instance
© 2015 MapR Technologies
Network Analytics in a BI Context
• Getting results from BI tools requires SQL expertise
– Analytic techniques, visualizations, dashboarding
– Proprietary information about your operations
– Making sense of sources quickly
• New SQL-on-Hadoop (like Drill) technologies enable leveraging
this:
– To find new areas to gain value from combining your own proprietary
data with network sources
– Augment the analysis you’re doing now via use cases for packet data
you’re already storing in Hadoop
– Use data in real-time that’s too large to fit into memory and/or hits BI
tool limitations for analysis directly
© 2015 MapR Technologies
Hadoop Packet Processing Ecosystem
• Translating to various formats
– JSON
– CSV
– Parquet, others
• Packet ingestion
– Flume tcpdump source
– Direct from hardware vendors
• Northbound APIs
– Openstack and opendaylight
• More open source tools
– Packet processing in Pig, etc.
© 2015 MapR Technologies
Network Data Sources
• Data sources in the network are growing, changing
– Existing: tcpdump, SPAN, pcap
– New and more: SDN, NFV, REST APIs
• Often not suitable for analysis directly
– Requires building a schema
– ETL
– Structure is changing and evolving  ongoing management
– Large size, too big for memory
© 2015 MapR Technologies
REST APIs and JSON
• Self-describing data is common with REST APIs
– JSON
• Northbound APIs on almost everything in the network
– Enables access to many operational views
– But requires development work to pull it together
• SQL queries directly on the data is difficult
• Requires transformations, scripting, parsing
© 2015 MapR Technologies
View Drillbits information
in the cluster
© 2015 MapR Technologies
Manage storage plugin
instances through Web UI
© 2015 MapR Technologies
Monitor and
manage Drill queries
© 2015 MapR Technologies
See details of the query
© 2015 MapR Technologies
SAP Lumira and Wireshark Example -- Scenario
• Overview:
– Sensor data in JSON format being gathered multiple times daily from
remote locations
– Done over an IP network, each sensor has an IP address
• Problem
– One sensor is experiencing reading failures
– Network connectivity issues are suspected
• Solution Approach
– Take packet captures where we are reading sensors (central location) –
CSV-formatted Wireshark file
– Observe whether there are many TCP retransmissions happening
between the source and destination
– Ultimately, determine if the network is the problem and take action
© 2015 MapR Technologies
© 2015 MapR Technologies
Summary
• Using Drill from SAP Lumira, and the JDBC driver
– We compared data across multiple sources
• Notice we didn’t do any ETL
– Or define any schema for the network data
• Using existing ANSI SQL knowledge to query the data without
transformations
– Not just on the network data, but combined with other sources
• Self-service
© 2015 MapR Technologies
© 2015 MapR Technologies
Network Routing, OpenStack, JSON
• Link-state routing protocols (OSPF, IS-IS, Trill)
– Each participating node knows the topology of the entire network
– A dump of the database shows all nodes and adjacencies
– Physical and logical topology
– Other information (MPLS, etc.)
• OpenStack: pull networks, subnets, ports via REST API
– Use Drill Explorer to build a view
– Combine the data with device or customer information
• Enables visualizing the entire network quickly
© 2015 MapR Technologies
OpenStack Networking APIs Example
• JSON formatted responses
• Run queries without any data preparation
• Use of FLATTEN() for arbitrary maps
© 2015 MapR Technologies
FLATTEN()
• FLATTEN() is useful for exploration of data that is repeated
• Used on arrays
• Columns are repeated as necessary to maintain association with
each element of the array
• Example:
“host routes”: [
{
“destination” : “0.0.0.0/0”,
“nexthop”: “10.10.10.1”
},
{
“destination” : “192.168.10.0/24”,
“nexthop”: “192.168.0.1”,
},
…
]
© 2015 MapR Technologies
© 2015 MapR Technologies
TCP Round-Trip Times Example
• TCP RTT can affect customer experience in many ways
– Not just loading pages
– Also interactive, AJAX, forms, etc.
• Much of this can be calculated with other tools, then visualized
– Complex to calculate on your own
• Only a part of overall performance story, but helpful
– Example: switching network providers, adding caches or optimizers
© 2015 MapR Technologies
© 2015 MapR Technologies
Summary and Conclusions
• New SQL-on-Hadoop technologies enable network analysis in a BI
context
– Less time making schema, fewer requirements
– Easily supplement existing analysis
– Less need for specialized tools
• Apache Drill reduces the time required to get answers from network
data
– JSON analysis in place – interactive
– Queries and dashboards
– Integrated with BI tools out of the box
• Tableau, MicroStrategy, Qlikview, others
• More examples on github and YouTube
– mapr-demos

More Related Content

What's hot

Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
DataWorks Summit
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
Yahoo Developer Network
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
DataWorks Summit/Hadoop Summit
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
Ofir Manor
 
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processingHave your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
DataWorks Summit
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
DataWorks Summit
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
Cloudera, Inc.
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
DataWorks Summit
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
DataWorks Summit
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
Wangda Tan
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
 

What's hot (20)

Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processingHave your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 

Viewers also liked

Dell | Your Path – Our Platform & Great Partnerships
Dell | Your Path – Our Platform & Great PartnershipsDell | Your Path – Our Platform & Great Partnerships
Dell | Your Path – Our Platform & Great Partnerships
DataWorks Summit
 
Algorithms of the heart
Algorithms of the heartAlgorithms of the heart
Algorithms of the heart
DataWorks Summit
 
Кружок
КружокКружок
Кружок
koneqq
 
Конвенция о правах ребенка
Конвенция о правах ребенкаКонвенция о правах ребенка
Конвенция о правах ребенка
koneqq
 
NBLSA-Pre-Law-Membership-Guide-2015-2016
NBLSA-Pre-Law-Membership-Guide-2015-2016NBLSA-Pre-Law-Membership-Guide-2015-2016
NBLSA-Pre-Law-Membership-Guide-2015-2016Charmika A. Placide
 
Leadership in the sixth wave
Leadership in the sixth waveLeadership in the sixth wave
Leadership in the sixth wave
Thuy Tran
 
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
 
Etimology
EtimologyEtimology
Etimology
Andrea Izzo
 
Pro and cons of quoting a small cap
Pro and cons of quoting a small capPro and cons of quoting a small cap
Pro and cons of quoting a small capAntevenio S.A
 
Biotecnologia
BiotecnologiaBiotecnologia
Biotecnologia
brisasescorial
 
Spark After Dark: Real-time, Advanced Analytics with Spark
Spark After Dark: Real-time, Advanced Analytics with SparkSpark After Dark: Real-time, Advanced Analytics with Spark
Spark After Dark: Real-time, Advanced Analytics with Spark
DataWorks Summit
 
Unit 3 vocab
Unit 3 vocabUnit 3 vocab
Unit 3 vocab
Shannon Gilliland
 
ρατσισμος
ρατσισμοςρατσισμος
Use of l1 at primary level in l2 learning class room
Use of l1 at primary level in l2 learning class roomUse of l1 at primary level in l2 learning class room
Use of l1 at primary level in l2 learning class roommuhammad asif
 
Экология озер Ново-савиновского района г. Казани
Экология озер Ново-савиновского района г. КазаниЭкология озер Ново-савиновского района г. Казани
Экология озер Ново-савиновского района г. Казани
koneqq
 
Marketing Social para una sociedad responsable
Marketing Social para una sociedad responsableMarketing Social para una sociedad responsable
Marketing Social para una sociedad responsable
Paco Lorente
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
 
Archetypes in Branding
Archetypes in Branding Archetypes in Branding
Archetypes in Branding
Margaret Hartwell
 
L1 use in the L2 classroom
L1 use in the L2 classroomL1 use in the L2 classroom
L1 use in the L2 classroom
richpemberton
 

Viewers also liked (19)

Dell | Your Path – Our Platform & Great Partnerships
Dell | Your Path – Our Platform & Great PartnershipsDell | Your Path – Our Platform & Great Partnerships
Dell | Your Path – Our Platform & Great Partnerships
 
Algorithms of the heart
Algorithms of the heartAlgorithms of the heart
Algorithms of the heart
 
Кружок
КружокКружок
Кружок
 
Конвенция о правах ребенка
Конвенция о правах ребенкаКонвенция о правах ребенка
Конвенция о правах ребенка
 
NBLSA-Pre-Law-Membership-Guide-2015-2016
NBLSA-Pre-Law-Membership-Guide-2015-2016NBLSA-Pre-Law-Membership-Guide-2015-2016
NBLSA-Pre-Law-Membership-Guide-2015-2016
 
Leadership in the sixth wave
Leadership in the sixth waveLeadership in the sixth wave
Leadership in the sixth wave
 
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
 
Etimology
EtimologyEtimology
Etimology
 
Pro and cons of quoting a small cap
Pro and cons of quoting a small capPro and cons of quoting a small cap
Pro and cons of quoting a small cap
 
Biotecnologia
BiotecnologiaBiotecnologia
Biotecnologia
 
Spark After Dark: Real-time, Advanced Analytics with Spark
Spark After Dark: Real-time, Advanced Analytics with SparkSpark After Dark: Real-time, Advanced Analytics with Spark
Spark After Dark: Real-time, Advanced Analytics with Spark
 
Unit 3 vocab
Unit 3 vocabUnit 3 vocab
Unit 3 vocab
 
ρατσισμος
ρατσισμοςρατσισμος
ρατσισμος
 
Use of l1 at primary level in l2 learning class room
Use of l1 at primary level in l2 learning class roomUse of l1 at primary level in l2 learning class room
Use of l1 at primary level in l2 learning class room
 
Экология озер Ново-савиновского района г. Казани
Экология озер Ново-савиновского района г. КазаниЭкология озер Ново-савиновского района г. Казани
Экология озер Ново-савиновского района г. Казани
 
Marketing Social para una sociedad responsable
Marketing Social para una sociedad responsableMarketing Social para una sociedad responsable
Marketing Social para una sociedad responsable
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
 
Archetypes in Branding
Archetypes in Branding Archetypes in Branding
Archetypes in Branding
 
L1 use in the L2 classroom
L1 use in the L2 classroomL1 use in the L2 classroom
L1 use in the L2 classroom
 

Similar to Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks

Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Mats Uddenfeldt
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
DataWorks Summit
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
DataWorks Summit
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
freshdatabos
 
IBM Internet-of-Things architecture and capabilities
IBM Internet-of-Things architecture and capabilitiesIBM Internet-of-Things architecture and capabilities
IBM Internet-of-Things architecture and capabilities
IBM_Info_Management
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
BlueData, Inc.
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Alluxio, Inc.
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
John Berns
 

Similar to Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks (20)

Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Design patternsforiot
Design patternsforiotDesign patternsforiot
Design patternsforiot
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Apache drill
Apache drillApache drill
Apache drill
 
AnilKumarT_Resume_latest
AnilKumarT_Resume_latestAnilKumarT_Resume_latest
AnilKumarT_Resume_latest
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
IBM Internet-of-Things architecture and capabilities
IBM Internet-of-Things architecture and capabilitiesIBM Internet-of-Things architecture and capabilities
IBM Internet-of-Things architecture and capabilities
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks

  • 1. © 2015 MapR Technologies© 2015 MapR Technologies Exploring Enterprise Networks with Familiar BI Tools
  • 2. © 2015 MapR Technologies On the Menu • Discovery: why Hadoop + BI tools for analyzing networks? • Network analysis in a BI context • Apache Drill • Connecting BI tools to network data • Practical examples with Drill and BI – Querying packets with Tableau – Troubleshooting with SAP Lumira – Gaining insight into customer experience across multiple sources – Using built-in Drill features for faster analysis • Summary, conclusions, more resources
  • 3. © 2015 MapR Technologies Topics not covered in detail… • Packet capture architectures • Ways to capture packets effectively • Large-scale packet processing – others have done this • Comparison of BI tools • Survey of the best SQL-on-Hadoop technology
  • 4. © 2015 MapR Technologies There’s a lot happening in your network… • Packets, logs, interconnections • Many layers (L1-L7), “L8” • Network data is multi-faceted… – It’s serialized and highly structured – It facilitates communication between heterogeneous devices via common protocols – But it’s not structured to be stored and analyzed – The application often doesn’t care – Consequently, specialized tooling and software is required
  • 5. © 2015 MapR Technologies Why Hadoop + BI tools? • What does Hadoop enable that makes it a powerful tool for network analytics? • What’s new that wasn’t previous possible/desirable? • How does it augment existing solutions? • It’s many things: – New ways of accessing semi-structured data from the network – Offloading of existing data warehouses and tools – Combining, joining, blending network captures with other sources – Many network tools cannot answer questions about your business and customers – You can use SQL to get a lot of the answers you need
  • 6. © 2015 MapR Technologies New Data Sources Unlock New Insights & Apps Existing structured data • Well-defined and well- understood schema – OLTP data – Data warehouse data – End user data stores (e.g., Excel) New multi-structured data • Typically un-modeled, different in format – Network data – Clickstream data – Sensor data – Rich media (e.g., audio, video) – Documents … both types are needed today for deeper insights
  • 7. © 2015 MapR Technologies 1980 2000 20101990 2020 Fixed schema DBA controls structure Dynamic / Flexible schema Application controls structure NON-RELATIONAL DATASTORESRELATIONAL DATABASES GBs-TBs TBs-PBsVolume Database Network data, like other data, is increasingly Stored in Non- Relational Datastores Structure Development Structured Structured, semi-structured and unstructured Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
  • 8. © 2015 MapR Technologies Apache Drill Brings Flexibility & Performance Access to any data type, any data source • Relational • Nested data • Schema-less Rapid time to insights • Query data in-situ • No Schemas required • Easy to get started Integration with existing tools • ANSI SQL • BI tool integration Scale in all dimensions • TB-PB of scale • 1000’s of users • 1000’s of nodes Granular Security • Authentication • Row/column level controls • De-centralized
  • 9. © 2015 MapR Technologies Granular security permissions through Drill views Name City State Credit Card # Dave San Jose CA 1374-7914-3865-4817 John Boulder CO 1374-9735-1794-9711 Raw File (/raw/cards.csv) Owner Admins Permission Admins Business Analyst Data Scientist Name City State Credit Card # Dave San Jose CA 1374-1111-1111-1111 John Boulder CO 1374-1111-1111-1111 Data Scientist View (/views/maskedcards.csv) Not a physical data copy Name City State Dave San Jose CA John Boulder CO Business Analyst View Owner Admins Permission Business Analysts Owner Admins Permission Data Scientists
  • 10. © 2015 MapR Technologies Self-Service Data Exploration Direct access to Hadoop data from familiar BI / Analytics tools- ANSI SQL compatible Ad-hoc Reporting Queries Raw Data Exploration Day Zero queries …
  • 11. © 2015 MapR Technologies Drill is a Distributed SQL query engine drillbit DataNode/Regi onServer drillbit DataNode/Regi onServer drillbit DataNode/Regi onServer ZooKeeper ZooKeeper ZooKeeper …  Scale out  Columnar and Vectorized execution  Optimistic and pipelined execution (no MR, Spark, Tez)  Late binding  Extensible
  • 12. © 2015 MapR Technologies - Sub-directory - HBase namespace - Hive database Run SQL on Captures Directly SELECT * FROM dfs.router1.`captures.json` Workspace - Pathnames - Hive table - HBase table Table - DFS (Text, Parquet, JSON) - HBase/MapR-DB - Hive Metastore/HCatalog - Easy API to go beyond Hadoop Storage plugin instance
  • 13. © 2015 MapR Technologies Network Analytics in a BI Context • Getting results from BI tools requires SQL expertise – Analytic techniques, visualizations, dashboarding – Proprietary information about your operations – Making sense of sources quickly • New SQL-on-Hadoop (like Drill) technologies enable leveraging this: – To find new areas to gain value from combining your own proprietary data with network sources – Augment the analysis you’re doing now via use cases for packet data you’re already storing in Hadoop – Use data in real-time that’s too large to fit into memory and/or hits BI tool limitations for analysis directly
  • 14. © 2015 MapR Technologies Hadoop Packet Processing Ecosystem • Translating to various formats – JSON – CSV – Parquet, others • Packet ingestion – Flume tcpdump source – Direct from hardware vendors • Northbound APIs – Openstack and opendaylight • More open source tools – Packet processing in Pig, etc.
  • 15. © 2015 MapR Technologies Network Data Sources • Data sources in the network are growing, changing – Existing: tcpdump, SPAN, pcap – New and more: SDN, NFV, REST APIs • Often not suitable for analysis directly – Requires building a schema – ETL – Structure is changing and evolving  ongoing management – Large size, too big for memory
  • 16. © 2015 MapR Technologies REST APIs and JSON • Self-describing data is common with REST APIs – JSON • Northbound APIs on almost everything in the network – Enables access to many operational views – But requires development work to pull it together • SQL queries directly on the data is difficult • Requires transformations, scripting, parsing
  • 17. © 2015 MapR Technologies View Drillbits information in the cluster
  • 18. © 2015 MapR Technologies Manage storage plugin instances through Web UI
  • 19. © 2015 MapR Technologies Monitor and manage Drill queries
  • 20. © 2015 MapR Technologies See details of the query
  • 21. © 2015 MapR Technologies SAP Lumira and Wireshark Example -- Scenario • Overview: – Sensor data in JSON format being gathered multiple times daily from remote locations – Done over an IP network, each sensor has an IP address • Problem – One sensor is experiencing reading failures – Network connectivity issues are suspected • Solution Approach – Take packet captures where we are reading sensors (central location) – CSV-formatted Wireshark file – Observe whether there are many TCP retransmissions happening between the source and destination – Ultimately, determine if the network is the problem and take action
  • 22. © 2015 MapR Technologies
  • 23. © 2015 MapR Technologies Summary • Using Drill from SAP Lumira, and the JDBC driver – We compared data across multiple sources • Notice we didn’t do any ETL – Or define any schema for the network data • Using existing ANSI SQL knowledge to query the data without transformations – Not just on the network data, but combined with other sources • Self-service
  • 24. © 2015 MapR Technologies
  • 25. © 2015 MapR Technologies Network Routing, OpenStack, JSON • Link-state routing protocols (OSPF, IS-IS, Trill) – Each participating node knows the topology of the entire network – A dump of the database shows all nodes and adjacencies – Physical and logical topology – Other information (MPLS, etc.) • OpenStack: pull networks, subnets, ports via REST API – Use Drill Explorer to build a view – Combine the data with device or customer information • Enables visualizing the entire network quickly
  • 26. © 2015 MapR Technologies OpenStack Networking APIs Example • JSON formatted responses • Run queries without any data preparation • Use of FLATTEN() for arbitrary maps
  • 27. © 2015 MapR Technologies FLATTEN() • FLATTEN() is useful for exploration of data that is repeated • Used on arrays • Columns are repeated as necessary to maintain association with each element of the array • Example: “host routes”: [ { “destination” : “0.0.0.0/0”, “nexthop”: “10.10.10.1” }, { “destination” : “192.168.10.0/24”, “nexthop”: “192.168.0.1”, }, … ]
  • 28. © 2015 MapR Technologies
  • 29. © 2015 MapR Technologies TCP Round-Trip Times Example • TCP RTT can affect customer experience in many ways – Not just loading pages – Also interactive, AJAX, forms, etc. • Much of this can be calculated with other tools, then visualized – Complex to calculate on your own • Only a part of overall performance story, but helpful – Example: switching network providers, adding caches or optimizers
  • 30. © 2015 MapR Technologies
  • 31. © 2015 MapR Technologies Summary and Conclusions • New SQL-on-Hadoop technologies enable network analysis in a BI context – Less time making schema, fewer requirements – Easily supplement existing analysis – Less need for specialized tools • Apache Drill reduces the time required to get answers from network data – JSON analysis in place – interactive – Queries and dashboards – Integrated with BI tools out of the box • Tableau, MicroStrategy, Qlikview, others • More examples on github and YouTube – mapr-demos