Procuring digital preservation CAN be quick and painless with our new dynamic...
Hadoop spark online demo
1. Apache Spark
●
What is it ?
●
How does it work ?
●
Benefits
●
Tuning
●
Examples
www.xoomtrainings.com sales@xoomtrainings.com
2. Spark – What is it ?
●
Open Source
●
Alternative to Map Reduce for certain applications
●
A low latency cluster computing system
●
For very large data sets
●
May be 100 times faster than Map Reduce for
– Iterative algorithms
– Interactive data mining
●
Used with Hadoop / HDFS
●
Released under BSD License
www.xoomtrainings.com sales@xoomtrainings.com
3. Spark – How does it work ?
●
Uses in memory cluster computing
●
Memory access faster than disk access
●
Has API's written in
– Scala
– Java
– Python
●
Can be accessed from Scala and Python shells
●
Currently an Apache incubator project
www.xoomtrainings.com sales@xoomtrainings.com
4. Spark – Benefits
●
Scales to very large clusters
●
Uses in memory processing for increased speed
●
High Level API's
– Java, Scala, Python
●
Low latency shell access
www.xoomtrainings.com sales@xoomtrainings.com
5. Spark – Tuning
●
Bottlenecks can occur in the cluster via
– CPU, memory or network bandwidth
●
Tune data serialization method i.e.
– Java ObjectOutputStream vs Kryo
●
Memory Tuning
– Use primitive types
– Set JVM Flags
– Store objects in serialized form i.e.
●
RDD Persistence
●
MEMORY_ONLY_SER
www.xoomtrainings.com sales@xoomtrainings.com
6. Spark – Examples
• Example from spark-project.org, Spark job in Scala.
• Showing a simple text count from a system log.
•
• /*** SimpleJob.scala ***/
•
• import spark.SparkContext
• import SparkContext._
•
• object SimpleJob {
• def main(args: Array[String]) {
• val logFile = "/var/log/syslog" // Should be some file on your system
• val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME",
• List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
• val logData = sc.textFile(logFile, 2).cache()
• val numAs = logData.filter(line => line.contains("a")).count()
• val numBs = logData.filter(line => line.contains("b")).count()
• println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
• }
• }
•www.xoomtrainings.com sales@xoomtrainings.com
7. Contact Us
●
Feel free to contact us at
●
– www.xoomtrainings.com
– sales@xoomtrainings.com
-- USA : +1-610-686-8077 or India : +91-404-018-3355
●
We offer IT project consultancy
●
We are happy to hear about your problems
●
You can just pay for those hours that you need
●
To solve your problems