By Sreenivas, Finny & Omkar.pptx

SPAMDETECTION
By ,Finny, Omkar, Sreenivas

DATA
Big data refers to the large, diverse sets of
information that grow at ever-increasing
rates.
Variety
Volume Velocity
BIG

• Apache Spark
⚬ open-source unified analytics engine
⚬ large-scale data processing.
• Used as a replacement for MapReduce
• Processes data
⚬ using the Master-Slave principle.

Spark SQL is Apache
Spark's module for
working with
structured data.
Spark Streaming
streaming data can be
analyzed in real time
MLib
is a machine learning
library of Apache Spark
GraphX
Graph analysis can be
performed with the
GraphX library

The biggest harm of spam emails is that contrary to
popular belief, the
recipient faces more costs than the sender.

FLATMAP
collapse the elements of a collection to create a
single collection with elements of the same type
JAVA RDD
Resilient Distributed Datasets - serves as the
building blocks for distributed data processing in
Spark
SPARK CONTEXT
To perform data analysis with Spark, a Spark
Context is required. It serves as a bridge to access
data within the Spark environment.

Creating Spark
Context
Java RDD Structure
Separate RDDs for Spam
and Non-Spam Emails
FlatMap Process
Text to Vector
Transformation
Modeling with Naive
Bayes

By Sreenivas, Finny & Omkar.pptx

By Sreenivas, Finny & Omkar.pptx

More Related Content

Similar to By Sreenivas, Finny & Omkar.pptx

Recently uploaded

By Sreenivas, Finny & Omkar.pptx