• Share
  • Email
  • Embed
  • Like
  • Private Content
Combat Cyber Threats with Cloudera Impala & Apache Hadoop

Combat Cyber Threats with Cloudera Impala & Apache Hadoop



Learn how you can use Cloudera Impala to:

Learn how you can use Cloudera Impala to:

- Operate with all data in your domain
- Address cyber security analysis and forensics needs
- Combat fraud, waste, and abuse



Total Views
Views on SlideShare
Embed Views



10 Embeds 565

http://www.scoop.it 276
http://www.cloudera.com 177
http://cloudera.com 77
http://author01.mtv.cloudera.com 11
http://localhost 8
http://staging-author01.mtv.cloudera.com 5
https://twitter.com 5
https://hootsuite.scoop.it 2
http://dschool.co 2
http://author01.core.cloudera.com 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Interactive SQL for HadoopResponses in seconds vs. minutes or hours4-100x faster than HiveNearly ANSI-92 standard SQL with HiveQLCREATE, ALTER, SELECT, INSERT, JOIN, subqueries, etc.ODBC/JDBC drivers Compatible SQL interface for existing Hadoop/CDH applicationsNative MPP Query EnginePurpose-built for low latency queries – another application being brought to HadoopSeparate runtime from MapReduce which is designed for batch processingTightly integrated with Hadoop ecosystem – major design imperative and differentiator for ClouderaSingle system (no integration)Native, open file formats that are compatible across the ecosystem (no copying)Single metadata model (no synchronization)Single set of hardware and system resources (better performance, lower cost)Integrated, end-to-end security (no vulnerabilities)Open SourceKeeps with our strategy of an open platform – i.e. if it stores or processes data, it’s open sourceApache-licensedCode available on Github
  • More & Faster Value from Big DataProvides an interactive BI/Analytics experience on HadoopPreviously BI/Analytics was impractical due to the batch orientation of MapReduceEnables more users to gain value from organizational data assets (SQL/BI users)Makes more data available for analysis (raw data, multi-structured data, historical data)Removes delays from data migrationInto specialized analytical DBMSsInto proprietary file formats that happen to be stored in HDFSInto transient in-memory storesFlexibilityQuery across existing data in HadoopHDFS and HBaseAccess data immediately and directly in its native formatSelect best-fit file formatsUse raw data formats when unsure of access patterns (text files, RCFiles, LZO)Increase performance with optimized file formats when access patterns are known (Parquet, Avro)All file formats are compatible across the entire Hadoop ecosystem – i.e. MapReduce, Pig, Hive, Impala, etc. on the same data at the same timeCost EfficiencyReduce movement, duplicate storage & computeData movement: no time or resource penalty for migrating data into specialized systems or formatsDuplicate storage: no need to duplicate data across systems or within the same system in different file formatsCompute: use the same compute resources as the rest of the Hadoop system – You don’t need a separate set of nodes to run interactive query vs. batch processing (MapReduce)You don’t need to overprovision your hardware to enable memory-intensive, on-the-fly format conversions10% to 1% the cost of analytic DMBSLess than $1,000/TBFull Fidelity AnalysisNo loss of fidelity from aggregations or conforming to fixed schemasIf the attribute exists in the raw data, you can query against it
  • This is an overview of my simple cluster I put together for the Webinar, 4 nodes in total: 3 node Hadoop Cluster and an Application Server.So the configuration here is one that would be present in many public and private organizationsWe have placed a sensor at the gateway or gateway(s) across the enterprise monitoring traffic incoming and outgoing.This information is captured by a variety of sensor/collectors and written to files on a regular basis.So now lets go through the data sets.
  • 1.) Provide a brief tour of the cluster using Cloudera Manager

Combat Cyber Threats with Cloudera Impala & Apache Hadoop Combat Cyber Threats with Cloudera Impala & Apache Hadoop Presentation Transcript