SlideShare a Scribd company logo
1 of 45
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 1
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 2
1. Intro
2. Problem to solve?
3. How does Flume/Solr help?
4. Syslog indexing example
5. HA, DR & scalability
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 3
Ops Architect at Cisco CCATG (WebEx)
Ensure operational readiness for complex distributed services
HA, DR, monitoring, config, deployment
Previously eBay, Excite@Home, IBM, VISA
Operations architecture, monitoring, event correlation
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 4
Ā© 2012 Cisco and/or its affiliates. All rights reserved. 5
Cisco WebEx Meetings
ā€¢ Voice, video, desktop sharing
ā€¢ Meeting/Event/Support/Training
ā€¢ Centers
ā€¢ Integration with TelePresence
Cisco WebEx Social
ā€¢ Social networking
ā€¢ Content creation
ā€¢ Integrated IM
Cisco WebEx Messenger
ā€¢ IM, presence
ā€¢ Integrate with voice, video
ā€¢ XMPP
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 6Ā© 2010 Cisco and/or its affiliates. All rights reserved. 6
Participants from over 231 countries, 52% market share
2.2 Billion meeting minutes per month
40.5 Million meeting attendees per month
9.4 million registered hosts worldwide
4 Million mobile downloads
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 7Ā© 2010 Cisco and/or its affiliates. All rights reserved. 7
Datacenter / PoP
Leased network link
Global Scale: 13 datacenters &
iPoPs around the globe
Dedicated network: dual path
10G circuits between DCs
Multi-tenant: 95k sites
Real-time collaboration:
voice, desktop sharing, video, chat
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 8Ā© 2010 Cisco and/or its affiliates. All rights reserved. 8
Datacenter / PoP
Leased network link
People make mistakes
Hardware fails
Software fails
Even failovers sometimes fail
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 9
ā€œIf a problem has no solution, it may not be a problem,
but a fact, not to be solved, but to be coped with over timeā€
ā€” Shimon Peres (ā€œPeresā€™s Lawā€)
People/HW/SW failures are facts, not problems
Operations main goal is to maintain high service availability
ā€¢ Recovery/repair is how we cope with above facts
ā€¢ Improving recovery/repair improves availability
UnAvailability = MTTR / MTBF
1/10th MTTR just as valuable as 10x MTBF
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 10
Even better: proactive
Good: reactive
Your search ā€“ What is the root cause of the outage? ā€“ did not match any documents.
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 11
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 12
Flume
Log4j
File
Avro
Syslog
Other Sinks
Solr
Sink
Applicationstate&APIs
HDFS
Thrift
AMQP RDBMS
Sqoop
HTTP/REST
MySQL
Unstructured/semi-structured data Structured data
Cisco UCS C240 M3 servers
12 x 3TB = 36 TB / server
HDFS
Sink
SolrCloud
Raw dataSolr index
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 13
DC 1
HDFS
Flume
SolrCloud
Flume
Flume
DC 2
HDFS
Flume
SolrCloud
Flume
Flume
DC 1
Flume
Flume
Flume
syslog log4j file
DC N
Flume
Flume
Flume
syslog log4j file
ā€¦ Collector
tier
Storage
tier
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 14
agent agent agent
File
Channel 1
Avro
src
DC1
Avro
sink
DC2
Avro
sink
File
Channel 2
ā€¦
Replicating
fan-out
flow
Flume Collector server
Failover & load
balancing agents
Flume Storage tier
All events replicated to
both Channels
DC1 DC2
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 15
DC 1
HDFS
Flume
SolrCloud
Flume
Flume
DC 2
HDFS
Flume
SolrCloud
Flume
Flume
DC 1
Flume
Flume
Flume
syslog log4j file
DC N
Flume
Flume
Flume
syslog log4j file
ā€¦ Collector
tier
Storage
tier
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 16
File
Channel 1
Avro
src
Solr
Sink
HDFS
sink
File
Channel 2
ā€¦
Multiplexing
fan-out
flow
Flume Storage tier server
Failover & load
balancing agents
Flume
Collector
Flume
Collector
Flume
Collector
HDFSSolrCloud
Routing to Solr by
Flume event header
All events to HDFS
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 17
Isnā€™t Big Data ā€œschema on readā€?
ā€¢ Why does Solr require a schema on write?
ā€¢ Dirty little secret: thereā€™s always a schema
ā€¢ Performance & functionality vs flexibility
ā€¢ Optimize operations and storage based on field type - that's how you
get sub second response times
Thereā€™s always a schema
ā€¢ Application code vs. central location
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 18
Cloudera Morphlines
ā€¢ Framework to simplify event transformation
ā€¢ Compatible with existing grok patterns
ā€¢ Reusable across multiple index workloads:
Flume & M/R
Command: readLine
Command: grok
Command: loadSolr
Solr
Flume event = headers + body
Record
Document matching schema.xml
Command: tryRules
Command: addValues
ā€¦
Record
Record
Record
Record
SolrSink
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 19
Convert syslog message..
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com : %ACE-3-251008: Health probe
failed for server 10.240.22.111 on port 1234
.. into Solr schema fields
Severity=[3]
Facility=[22]
host=[colo01-wxp00-ace01b-connect.webex.com]
timestamp=[2013-06-16T04:36:49.000Z]
syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234]
severity_label=[error]
access_token=[54asdf654]
id=[b2f839c3-dece-404f-a535-e0141ad549bf]
cisco_product=[ACE]
cisco_level=[3]
cisco_id=[251008]
cisco_code=[%ACE-3-251008]
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 20
Convert syslog message
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-
251008: Health probe failed for server 10.240.22.111 on port 1234
Step 1: readLine reads in Flume event headers and body
timestamp=[1371357409000]
host=[colo01-wxp00-ace01b-connect.webex.com]
category=[545f5sfsd5sf]
Severity=[3]
Facility=[22]
message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013
04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port
1234]
Headers
Body
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 21
Convert syslog message
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-
251008: Health probe failed for server 10.240.22.111 on port 1234
Step 2: convertTimestamp converts epoch to ISO 8601 format
timestamp=[2013-06-16T04:36:49.000Z]
host=[colo01-wxp00-ace01b-connect.webex.com]
access_token=[545f5sfsd5sf]
message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013
04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port
1234]
Severity=[3]
Facility=[22]
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 22
Convert syslog message
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-
251008: Health probe failed for server 10.240.22.111 on port 1234
Step 3: addValues creates new field access_token
timestamp=[2013-06-16T04:36:49.000Z]
category=[545f5sfsd5sf]
access_token=[545f5sfsd5sf]
message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16
2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111
on port 1234]
host=[colo01-wxp00-ace01b-connect.webex.com]
Severity=[3]
Facility=[22]
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 23
Convert syslog message
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-
251008: Health probe failed for server 10.240.22.111 on port 1234
Step 4: tryRules creates field severity_label for severity
timestamp=[2013-06-16T04:36:49.000Z]
severity_label=[error]
access_token=[545f5sfsd5sf]
message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16
2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111
on port 1234]
host=[colo01-wxp00-ace01b-connect.webex.com]
category=[545f5sfsd5sf]
Severity=[3]
Facility=[22]
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 24
Convert syslog message
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-
251008: Health probe failed for server 10.240.22.111 on port 1234
Step 5: tryRules creates new fields
syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111
on port 1234]
cisco_product=[ACE]
cisco_level=[3]
cisco_id=[251008]
cisco_code=[%ACE-3-251008]
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 25
Convert syslog message
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-
251008: Health probe failed for server 10.240.22.111 on port 1234
Step 6: sanitizeUnknownSolrFields drops non-schema fields
timestamp=[2013-06-16T04:36:49.000Z]
syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111
on port 1234]
severity_label=[error]
access_token=[545f5sfsd5sf]
host=[colo01-wxp00-ace01b-connect.webex.com]
cisco_product=[ACE]
cisco_level=[3]
cisco_id=[251008]
cisco_code=[%ACE-3-251008]
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 26
Convert syslog message
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-
251008: Health probe failed for server 10.240.22.111 on port 1234
Step 7: generateUUID creates an unique id for the document
timestamp=[2013-06-16T04:36:49.000Z]
syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111
on port 1234]
severity_label=[error]
access_token=[545f5sfsd5sf]
id=[b2f839c3-dece-404f-a535-e0141ad549bf]
host=[colo01-wxp00-ace01b-connect.webex.com]
cisco_product=[ACE]
cisco_level=[3]
cisco_id=[251008]
cisco_code=[%ACE-3-251008]
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 27
Convert syslog message
<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-
251008: Health probe failed for server 10.240.22.111 on port 1234
Step 8: loadSolr loads a record into a Solr server
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 28
Command: readLine
Command: grok
Command: loadSolr
SolrCloud
Flume syslog event = headers + body
Record
Document matching schema.xml
Command: tryRules
Command: addValues
ā€¦
Record
Record
Record
Record
SolrSink
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 29
ZooKeeper
leader1
replica1
Shard1
leader2
replica2
Shard2
leader3
replica3
Shard3
SolrCloud cluster
zk1
zk2
zk3
Pluggable filesystem
(local, HDFS)
Add doc to syslog index
ā€¢ Collections, shards & replicas
ā€¢ Pluggable file system
ā€¢ Central config & coordination with ZK
ā€¢ Full HA, automatic fail-over
ā€¢ NRT indexing
ā€¢ Automatic routing
Where can I index data?
leader3
Collection
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 30
Collection ā€œsyslogā€ with
three shards
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 31
Special case of search
ā€¢ Logs are time series data: timestamp + data
ā€¢ High indexing rate, no updates
ā€¢ New data is more frequently searched than old
Collection aliases
ā€¢ Time partitioned collections ā€“ e.g. one collection per day
ā€¢ Reduces the workload to near-real-time data only
ā€¢ One-to-many collection mapping: queries go to a logical representation
mapped to multiple, same-schema collection
ā€¢ Simplifies for hot-warm-cold migration of data
Index expiration
ā€¢ Old data is aged out by Collection Aliases
ā€¢ Remap only the latest collection to an alias
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 32
Solr
ā€¢ No multi-datacenter cluster support
HDFS
ā€¢ No multi-datacenter cluster support
Options?
ā€¢ All our services must survive DC outage
ā€¢ . . so should logging and indexing
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 33
DC 1
HDFS
Flume
SolrCloud
Flume
Flume
DC 2
HDFS
Flume
SolrCloud
Flume
Flume
DC 1
Flume
Flume
Flume
syslog log4j file
DC 2
Flume
Flume
Flume
syslog log4j file
DC N
Flume
Flume
Flume
syslog log4j file
ā€¦
Collector
tier
Storage
tierPlanned or
unplanned outage
Flume Collector
disk channel
buffering DC1
events
DC1 Hadoop cluster
back online after outage
Replicate
aggregate
data
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 34
DC 1
HDFS
Flume
SolrCloud
Flume
Flume
DC 2
HDFS
SolrCloud
DC 1
Flume
Flume
Flume
syslog log4j file
DC 2
Flume
Flume
Flume
syslog log4j file
DC N
Flume
Flume
Flume
syslog log4j file
ā€¦ Collector
tier
Storage
tier
Flume
Flume
Flume
distcp
Manual CNAME
change to DC2
DC1 back
online, sync data
from DC2
Data sent only
to a single DC
distcp
DNS CNAME change
back to DC1
Flip distcp
the other way
Flume buffering events
at collector tier
Create indexes with M/R
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 35
Tiers to scale
ā€¢ Flume Collector tier
ā€¢ Flume Storage tier
ā€¢ SolrCloud
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 36
100 ā€“ 5000 servers per a datacenter
agent agent agent
File
Channel 1
Avro
src
DC1
Avro
sink
DC2
Avro
sink
File
Channel 2
ā€¦
Replicating
fan-out
flow
agent agent agent ā€¦
ā€¦Flume Collector
More agents and data
FileChannel:
14MB/sec
NIC:
100MB/sec
NIC:
100MB/sec
File
Channel 1
Avro
src
DC1
Avro
sink
DC2
Avro
sink
File
Channel 2
Replicating
fan-out
flow
Max per server:
14MB/s
1.2 TB/day
70k events/s
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 37
DC 1 collectors
DC 1
storage tier
Flume 1
DC 2
storage tier
Avro
sink
1
Avro
sink
2
Avro
sink
N
ā€¦
DC 2 collectors
Avro
sink
1
Avro
sink
2
Avro
sink
N
ā€¦
DC N collectors
Avro
sink
1
Avro
sink
2
Avro
sink
N
ā€¦ā€¦
File
Chan1
Avro
src
HDFS
sink
Solr
sink
File
Chan2
Multiplexing
fan-out
flow
File
Chan1
Avro
src
HDFS
sink
Solr
sink
File
Chan2
Multiplexing
fan-out
flow
File
Chan1
Avro
src
HDFS
sink
Solr
sink
File
Chan2
Multiplexing
fan-out
flow
File
Chan1
Avro
src
HDFS
sink
Solr
sink
File
Chan2
Multiplexing
fan-out
flow
Max per server:
14MB/s
1.2 TB/day
70k events/s
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 38
ZooKeeper
leader1
replica1
Shard1
leader2
replica2
Shard2
leader3
replica3
Shard3
SolrCloud cluster
zk1
zk2
zk3
Pluggable filesystem
(local, HDFS)
New logs
to index
Search
queries
1000
tx/sec/core
2x8 cores
16k tx/sec
3 shards
3 x 16k =
48k tx/sec
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 39
Central syslog servers
ā€¢ Network and OS system messages forwarded to several central syslog
servers
Forward syslog to Solr using Flume Morphline SolrSink
ā€¢ Parse messages with Morphline and grok patterns
SolrCloud
ā€¢ Index log lines as documents into a Collection (i.e. index)
HUE Solr search
ā€¢ Simple UI to build a customized search page layout with faceting, sorting.
ā€¢ Easy drill down with multiple facets: severity, datacenter, hostname, etc
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 40
Screen shots
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 41
Search by time
Sort by select field
Facets by selected fields
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 42
Wildcard query by field
Highlight the query
keywords
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 43
Data sources: REST/JSON, log4j, syslog, Avro, Thrift
Parsing: Cloudera Morphlines
NRT Indexing: SolrCloud embedded in CDH
Batch indexing: MapReduce
Analytics: Use your favorite tool, raw detailed data stored in HDFS
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 44
email: ari.flink@webex.com
twitter: @raaka
C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 45
Thank you.

More Related Content

What's hot

Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...DataWorks Summit
Ā 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
Ā 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
Ā 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it tooGwen (Chen) Shapira
Ā 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
Ā 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
Ā 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insightsOmid Vahdaty
Ā 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
Ā 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and MessagingXin Wang
Ā 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
Ā 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
Ā 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
Ā 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
Ā 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2Alexander Alten
Ā 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
Ā 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Steve Hoffman
Ā 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
Ā 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
Ā 

What's hot (20)

Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Ā 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
Ā 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Ā 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
Ā 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Ā 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
Ā 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
Ā 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Ā 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
Ā 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Ā 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
Ā 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
Ā 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
Ā 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Ā 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Ā 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
Ā 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
Ā 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Ā 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
Ā 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Ā 

Similar to Cisco WebEx Syslog Indexing with Flume and Solr

OpenStack + Cloud Foundry for the OpenStack Boston Meetup
OpenStack + Cloud Foundry for the OpenStack Boston MeetupOpenStack + Cloud Foundry for the OpenStack Boston Meetup
OpenStack + Cloud Foundry for the OpenStack Boston Meetupragss
Ā 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigrainePeak Hosting
Ā 
CtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceCtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceeTailing India
Ā 
Presentation cloupia product overview and demo
Presentation   cloupia product overview and demoPresentation   cloupia product overview and demo
Presentation cloupia product overview and demoxKinAnx
Ā 
Should healthcare abandon the cloud final
Should healthcare abandon the cloud finalShould healthcare abandon the cloud final
Should healthcare abandon the cloud finalsapenov
Ā 
Service-Level Objective for Serverless Applications
Service-Level Objective for Serverless ApplicationsService-Level Objective for Serverless Applications
Service-Level Objective for Serverless Applicationsalekn
Ā 
GVP8- Troubleshooting.pptx
GVP8- Troubleshooting.pptxGVP8- Troubleshooting.pptx
GVP8- Troubleshooting.pptxMiyuruChamath
Ā 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)Aman Kohli
Ā 
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use CasesMT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use CasesDell EMC World
Ā 
BIND 9 logging best practices
BIND 9 logging best practicesBIND 9 logging best practices
BIND 9 logging best practicesMen and Mice
Ā 
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph DatabaseTelecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph DatabaseNeo4j
Ā 
Supporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with SplunkSupporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with SplunkErin Sweeney
Ā 
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB
Ā 
Jim Stertz: Automation and Robotic Arm: Maximizing Throughput and Capacity
Jim Stertz: Automation and Robotic Arm: Maximizing Throughput and CapacityJim Stertz: Automation and Robotic Arm: Maximizing Throughput and Capacity
Jim Stertz: Automation and Robotic Arm: Maximizing Throughput and Capacity360mnbsu
Ā 
OpenStack + CloudFoundry Austin Meetup
OpenStack + CloudFoundry Austin MeetupOpenStack + CloudFoundry Austin Meetup
OpenStack + CloudFoundry Austin Meetupragss
Ā 
OS + CF Austin meetup
OS + CF Austin meetupOS + CF Austin meetup
OS + CF Austin meetupragss
Ā 
Impact2014 session #1317 you have got a friend on z - tales from cics tran...
Impact2014  session #1317   you have got a friend on z - tales from cics tran...Impact2014  session #1317   you have got a friend on z - tales from cics tran...
Impact2014 session #1317 you have got a friend on z - tales from cics tran...Elena Nanos
Ā 
Re-Architect Your Legacy Environment To Enable An Agile, Future-Ready Enterprise
Re-Architect Your Legacy Environment To Enable An Agile, Future-Ready EnterpriseRe-Architect Your Legacy Environment To Enable An Agile, Future-Ready Enterprise
Re-Architect Your Legacy Environment To Enable An Agile, Future-Ready EnterpriseDell World
Ā 
SmartDB Office Hours: Connection Pool Sizing Concepts
SmartDB Office Hours: Connection Pool Sizing ConceptsSmartDB Office Hours: Connection Pool Sizing Concepts
SmartDB Office Hours: Connection Pool Sizing ConceptsKoppelaars
Ā 

Similar to Cisco WebEx Syslog Indexing with Flume and Solr (20)

OpenStack + Cloud Foundry for the OpenStack Boston Meetup
OpenStack + Cloud Foundry for the OpenStack Boston MeetupOpenStack + Cloud Foundry for the OpenStack Boston Meetup
OpenStack + Cloud Foundry for the OpenStack Boston Meetup
Ā 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Ā 
CtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceCtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerce
Ā 
Presentation cloupia product overview and demo
Presentation   cloupia product overview and demoPresentation   cloupia product overview and demo
Presentation cloupia product overview and demo
Ā 
Should healthcare abandon the cloud final
Should healthcare abandon the cloud finalShould healthcare abandon the cloud final
Should healthcare abandon the cloud final
Ā 
Building the Case for System z Linux
Building the Case for System z LinuxBuilding the Case for System z Linux
Building the Case for System z Linux
Ā 
Service-Level Objective for Serverless Applications
Service-Level Objective for Serverless ApplicationsService-Level Objective for Serverless Applications
Service-Level Objective for Serverless Applications
Ā 
GVP8- Troubleshooting.pptx
GVP8- Troubleshooting.pptxGVP8- Troubleshooting.pptx
GVP8- Troubleshooting.pptx
Ā 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)
Ā 
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use CasesMT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
Ā 
BIND 9 logging best practices
BIND 9 logging best practicesBIND 9 logging best practices
BIND 9 logging best practices
Ā 
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph DatabaseTelecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Ā 
Supporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with SplunkSupporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with Splunk
Ā 
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
Ā 
Jim Stertz: Automation and Robotic Arm: Maximizing Throughput and Capacity
Jim Stertz: Automation and Robotic Arm: Maximizing Throughput and CapacityJim Stertz: Automation and Robotic Arm: Maximizing Throughput and Capacity
Jim Stertz: Automation and Robotic Arm: Maximizing Throughput and Capacity
Ā 
OpenStack + CloudFoundry Austin Meetup
OpenStack + CloudFoundry Austin MeetupOpenStack + CloudFoundry Austin Meetup
OpenStack + CloudFoundry Austin Meetup
Ā 
OS + CF Austin meetup
OS + CF Austin meetupOS + CF Austin meetup
OS + CF Austin meetup
Ā 
Impact2014 session #1317 you have got a friend on z - tales from cics tran...
Impact2014  session #1317   you have got a friend on z - tales from cics tran...Impact2014  session #1317   you have got a friend on z - tales from cics tran...
Impact2014 session #1317 you have got a friend on z - tales from cics tran...
Ā 
Re-Architect Your Legacy Environment To Enable An Agile, Future-Ready Enterprise
Re-Architect Your Legacy Environment To Enable An Agile, Future-Ready EnterpriseRe-Architect Your Legacy Environment To Enable An Agile, Future-Ready Enterprise
Re-Architect Your Legacy Environment To Enable An Agile, Future-Ready Enterprise
Ā 
SmartDB Office Hours: Connection Pool Sizing Concepts
SmartDB Office Hours: Connection Pool Sizing ConceptsSmartDB Office Hours: Connection Pool Sizing Concepts
SmartDB Office Hours: Connection Pool Sizing Concepts
Ā 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash CourseDataWorks Summit
Ā 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
Ā 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Ā 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Ā 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
Ā 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
Ā 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
Ā 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
Ā 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
Ā 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
Ā 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
Ā 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
Ā 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Ā 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Ā 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
Ā 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
Ā 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
Ā 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
Ā 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
Ā 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Ā 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
Ā 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Ā 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Ā 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Ā 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Ā 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
Ā 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
Ā 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Ā 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Ā 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Ā 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Ā 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Ā 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Ā 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Ā 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Ā 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Ā 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Ā 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Ā 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Ā 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Ā 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
Ā 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Ā 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
Ā 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĆŗjo
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024The Digital Insurer
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
Ā 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Ā 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Ā 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Ā 

Cisco WebEx Syslog Indexing with Flume and Solr

  • 1. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 1
  • 2. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 2 1. Intro 2. Problem to solve? 3. How does Flume/Solr help? 4. Syslog indexing example 5. HA, DR & scalability
  • 3. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 3 Ops Architect at Cisco CCATG (WebEx) Ensure operational readiness for complex distributed services HA, DR, monitoring, config, deployment Previously eBay, Excite@Home, IBM, VISA Operations architecture, monitoring, event correlation
  • 4. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 4
  • 5. Ā© 2012 Cisco and/or its affiliates. All rights reserved. 5 Cisco WebEx Meetings ā€¢ Voice, video, desktop sharing ā€¢ Meeting/Event/Support/Training ā€¢ Centers ā€¢ Integration with TelePresence Cisco WebEx Social ā€¢ Social networking ā€¢ Content creation ā€¢ Integrated IM Cisco WebEx Messenger ā€¢ IM, presence ā€¢ Integrate with voice, video ā€¢ XMPP
  • 6. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 6Ā© 2010 Cisco and/or its affiliates. All rights reserved. 6 Participants from over 231 countries, 52% market share 2.2 Billion meeting minutes per month 40.5 Million meeting attendees per month 9.4 million registered hosts worldwide 4 Million mobile downloads
  • 7. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 7Ā© 2010 Cisco and/or its affiliates. All rights reserved. 7 Datacenter / PoP Leased network link Global Scale: 13 datacenters & iPoPs around the globe Dedicated network: dual path 10G circuits between DCs Multi-tenant: 95k sites Real-time collaboration: voice, desktop sharing, video, chat
  • 8. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 8Ā© 2010 Cisco and/or its affiliates. All rights reserved. 8 Datacenter / PoP Leased network link People make mistakes Hardware fails Software fails Even failovers sometimes fail
  • 9. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 9 ā€œIf a problem has no solution, it may not be a problem, but a fact, not to be solved, but to be coped with over timeā€ ā€” Shimon Peres (ā€œPeresā€™s Lawā€) People/HW/SW failures are facts, not problems Operations main goal is to maintain high service availability ā€¢ Recovery/repair is how we cope with above facts ā€¢ Improving recovery/repair improves availability UnAvailability = MTTR / MTBF 1/10th MTTR just as valuable as 10x MTBF
  • 10. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 10 Even better: proactive Good: reactive Your search ā€“ What is the root cause of the outage? ā€“ did not match any documents.
  • 11. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 11
  • 12. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 12 Flume Log4j File Avro Syslog Other Sinks Solr Sink Applicationstate&APIs HDFS Thrift AMQP RDBMS Sqoop HTTP/REST MySQL Unstructured/semi-structured data Structured data Cisco UCS C240 M3 servers 12 x 3TB = 36 TB / server HDFS Sink SolrCloud Raw dataSolr index
  • 13. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 13 DC 1 HDFS Flume SolrCloud Flume Flume DC 2 HDFS Flume SolrCloud Flume Flume DC 1 Flume Flume Flume syslog log4j file DC N Flume Flume Flume syslog log4j file ā€¦ Collector tier Storage tier
  • 14. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 14 agent agent agent File Channel 1 Avro src DC1 Avro sink DC2 Avro sink File Channel 2 ā€¦ Replicating fan-out flow Flume Collector server Failover & load balancing agents Flume Storage tier All events replicated to both Channels DC1 DC2
  • 15. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 15 DC 1 HDFS Flume SolrCloud Flume Flume DC 2 HDFS Flume SolrCloud Flume Flume DC 1 Flume Flume Flume syslog log4j file DC N Flume Flume Flume syslog log4j file ā€¦ Collector tier Storage tier
  • 16. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 16 File Channel 1 Avro src Solr Sink HDFS sink File Channel 2 ā€¦ Multiplexing fan-out flow Flume Storage tier server Failover & load balancing agents Flume Collector Flume Collector Flume Collector HDFSSolrCloud Routing to Solr by Flume event header All events to HDFS
  • 17. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 17 Isnā€™t Big Data ā€œschema on readā€? ā€¢ Why does Solr require a schema on write? ā€¢ Dirty little secret: thereā€™s always a schema ā€¢ Performance & functionality vs flexibility ā€¢ Optimize operations and storage based on field type - that's how you get sub second response times Thereā€™s always a schema ā€¢ Application code vs. central location
  • 18. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 18 Cloudera Morphlines ā€¢ Framework to simplify event transformation ā€¢ Compatible with existing grok patterns ā€¢ Reusable across multiple index workloads: Flume & M/R Command: readLine Command: grok Command: loadSolr Solr Flume event = headers + body Record Document matching schema.xml Command: tryRules Command: addValues ā€¦ Record Record Record Record SolrSink
  • 19. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 19 Convert syslog message.. <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 .. into Solr schema fields Severity=[3] Facility=[22] host=[colo01-wxp00-ace01b-connect.webex.com] timestamp=[2013-06-16T04:36:49.000Z] syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] severity_label=[error] access_token=[54asdf654] id=[b2f839c3-dece-404f-a535-e0141ad549bf] cisco_product=[ACE] cisco_level=[3] cisco_id=[251008] cisco_code=[%ACE-3-251008]
  • 20. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 20 Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3- 251008: Health probe failed for server 10.240.22.111 on port 1234 Step 1: readLine reads in Flume event headers and body timestamp=[1371357409000] host=[colo01-wxp00-ace01b-connect.webex.com] category=[545f5sfsd5sf] Severity=[3] Facility=[22] message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] Headers Body
  • 21. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 21 Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3- 251008: Health probe failed for server 10.240.22.111 on port 1234 Step 2: convertTimestamp converts epoch to ISO 8601 format timestamp=[2013-06-16T04:36:49.000Z] host=[colo01-wxp00-ace01b-connect.webex.com] access_token=[545f5sfsd5sf] message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] Severity=[3] Facility=[22]
  • 22. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 22 Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3- 251008: Health probe failed for server 10.240.22.111 on port 1234 Step 3: addValues creates new field access_token timestamp=[2013-06-16T04:36:49.000Z] category=[545f5sfsd5sf] access_token=[545f5sfsd5sf] message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] host=[colo01-wxp00-ace01b-connect.webex.com] Severity=[3] Facility=[22]
  • 23. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 23 Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3- 251008: Health probe failed for server 10.240.22.111 on port 1234 Step 4: tryRules creates field severity_label for severity timestamp=[2013-06-16T04:36:49.000Z] severity_label=[error] access_token=[545f5sfsd5sf] message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] host=[colo01-wxp00-ace01b-connect.webex.com] category=[545f5sfsd5sf] Severity=[3] Facility=[22]
  • 24. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 24 Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3- 251008: Health probe failed for server 10.240.22.111 on port 1234 Step 5: tryRules creates new fields syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] cisco_product=[ACE] cisco_level=[3] cisco_id=[251008] cisco_code=[%ACE-3-251008]
  • 25. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 25 Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3- 251008: Health probe failed for server 10.240.22.111 on port 1234 Step 6: sanitizeUnknownSolrFields drops non-schema fields timestamp=[2013-06-16T04:36:49.000Z] syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] severity_label=[error] access_token=[545f5sfsd5sf] host=[colo01-wxp00-ace01b-connect.webex.com] cisco_product=[ACE] cisco_level=[3] cisco_id=[251008] cisco_code=[%ACE-3-251008]
  • 26. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 26 Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3- 251008: Health probe failed for server 10.240.22.111 on port 1234 Step 7: generateUUID creates an unique id for the document timestamp=[2013-06-16T04:36:49.000Z] syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] severity_label=[error] access_token=[545f5sfsd5sf] id=[b2f839c3-dece-404f-a535-e0141ad549bf] host=[colo01-wxp00-ace01b-connect.webex.com] cisco_product=[ACE] cisco_level=[3] cisco_id=[251008] cisco_code=[%ACE-3-251008]
  • 27. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 27 Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3- 251008: Health probe failed for server 10.240.22.111 on port 1234 Step 8: loadSolr loads a record into a Solr server
  • 28. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 28 Command: readLine Command: grok Command: loadSolr SolrCloud Flume syslog event = headers + body Record Document matching schema.xml Command: tryRules Command: addValues ā€¦ Record Record Record Record SolrSink
  • 29. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 29 ZooKeeper leader1 replica1 Shard1 leader2 replica2 Shard2 leader3 replica3 Shard3 SolrCloud cluster zk1 zk2 zk3 Pluggable filesystem (local, HDFS) Add doc to syslog index ā€¢ Collections, shards & replicas ā€¢ Pluggable file system ā€¢ Central config & coordination with ZK ā€¢ Full HA, automatic fail-over ā€¢ NRT indexing ā€¢ Automatic routing Where can I index data? leader3 Collection
  • 30. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 30 Collection ā€œsyslogā€ with three shards
  • 31. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 31 Special case of search ā€¢ Logs are time series data: timestamp + data ā€¢ High indexing rate, no updates ā€¢ New data is more frequently searched than old Collection aliases ā€¢ Time partitioned collections ā€“ e.g. one collection per day ā€¢ Reduces the workload to near-real-time data only ā€¢ One-to-many collection mapping: queries go to a logical representation mapped to multiple, same-schema collection ā€¢ Simplifies for hot-warm-cold migration of data Index expiration ā€¢ Old data is aged out by Collection Aliases ā€¢ Remap only the latest collection to an alias
  • 32. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 32 Solr ā€¢ No multi-datacenter cluster support HDFS ā€¢ No multi-datacenter cluster support Options? ā€¢ All our services must survive DC outage ā€¢ . . so should logging and indexing
  • 33. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 33 DC 1 HDFS Flume SolrCloud Flume Flume DC 2 HDFS Flume SolrCloud Flume Flume DC 1 Flume Flume Flume syslog log4j file DC 2 Flume Flume Flume syslog log4j file DC N Flume Flume Flume syslog log4j file ā€¦ Collector tier Storage tierPlanned or unplanned outage Flume Collector disk channel buffering DC1 events DC1 Hadoop cluster back online after outage Replicate aggregate data
  • 34. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 34 DC 1 HDFS Flume SolrCloud Flume Flume DC 2 HDFS SolrCloud DC 1 Flume Flume Flume syslog log4j file DC 2 Flume Flume Flume syslog log4j file DC N Flume Flume Flume syslog log4j file ā€¦ Collector tier Storage tier Flume Flume Flume distcp Manual CNAME change to DC2 DC1 back online, sync data from DC2 Data sent only to a single DC distcp DNS CNAME change back to DC1 Flip distcp the other way Flume buffering events at collector tier Create indexes with M/R
  • 35. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 35 Tiers to scale ā€¢ Flume Collector tier ā€¢ Flume Storage tier ā€¢ SolrCloud
  • 36. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 36 100 ā€“ 5000 servers per a datacenter agent agent agent File Channel 1 Avro src DC1 Avro sink DC2 Avro sink File Channel 2 ā€¦ Replicating fan-out flow agent agent agent ā€¦ ā€¦Flume Collector More agents and data FileChannel: 14MB/sec NIC: 100MB/sec NIC: 100MB/sec File Channel 1 Avro src DC1 Avro sink DC2 Avro sink File Channel 2 Replicating fan-out flow Max per server: 14MB/s 1.2 TB/day 70k events/s
  • 37. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 37 DC 1 collectors DC 1 storage tier Flume 1 DC 2 storage tier Avro sink 1 Avro sink 2 Avro sink N ā€¦ DC 2 collectors Avro sink 1 Avro sink 2 Avro sink N ā€¦ DC N collectors Avro sink 1 Avro sink 2 Avro sink N ā€¦ā€¦ File Chan1 Avro src HDFS sink Solr sink File Chan2 Multiplexing fan-out flow File Chan1 Avro src HDFS sink Solr sink File Chan2 Multiplexing fan-out flow File Chan1 Avro src HDFS sink Solr sink File Chan2 Multiplexing fan-out flow File Chan1 Avro src HDFS sink Solr sink File Chan2 Multiplexing fan-out flow Max per server: 14MB/s 1.2 TB/day 70k events/s
  • 38. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 38 ZooKeeper leader1 replica1 Shard1 leader2 replica2 Shard2 leader3 replica3 Shard3 SolrCloud cluster zk1 zk2 zk3 Pluggable filesystem (local, HDFS) New logs to index Search queries 1000 tx/sec/core 2x8 cores 16k tx/sec 3 shards 3 x 16k = 48k tx/sec
  • 39. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 39 Central syslog servers ā€¢ Network and OS system messages forwarded to several central syslog servers Forward syslog to Solr using Flume Morphline SolrSink ā€¢ Parse messages with Morphline and grok patterns SolrCloud ā€¢ Index log lines as documents into a Collection (i.e. index) HUE Solr search ā€¢ Simple UI to build a customized search page layout with faceting, sorting. ā€¢ Easy drill down with multiple facets: severity, datacenter, hostname, etc
  • 40. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 40 Screen shots
  • 41. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 41 Search by time Sort by select field Facets by selected fields
  • 42. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 42 Wildcard query by field Highlight the query keywords
  • 43. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 43 Data sources: REST/JSON, log4j, syslog, Avro, Thrift Parsing: Cloudera Morphlines NRT Indexing: SolrCloud embedded in CDH Batch indexing: MapReduce Analytics: Use your favorite tool, raw detailed data stored in HDFS
  • 44. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 44 email: ari.flink@webex.com twitter: @raaka
  • 45. C97-717209-00 Ā© 2012 Cisco and/or its affiliates. All rights reserved. 45 Thank you.

Editor's Notes

  1. As of Feb 2013
  2. As of Feb 2013
  3. As of Feb 2013
  4. CEP: Complex Event Processing
  5. CEP: Complex Event Processing
  6. CEP: Complex Event Processing
  7. CEP: Complex Event Processing
  8. CEP: Complex Event Processing
  9. CEP: Complex Event Processing
  10. CEP: Complex Event Processing
  11. CEP: Complex Event Processing
  12. CEP: Complex Event Processing
  13. CEP: Complex Event Processing
  14. CEP: Complex Event Processing
  15. CEP: Complex Event Processing
  16. CEP: Complex Event Processing