What is Spark

•Download as PPTX, PDF•

0 likes•511 views

Quick introduction about Apache Spark and how it fits in the cognitive world, how can we use it to help cognitive solutions as well as create distributed algorithms to predict and perform other machine learning tasks.

Technology

Agenda
What is Spark?
Spark Libraries and Architecture
Spark role in the Cognitive world
Introducing Data Science Experience
How we are using Spark at Cognitive@IBM - Brazil

What is Spark?
Spark is a framework, a set of APIs and a parallel engine;
Created in AMPLab (Berkeley);
Developed in Scala (GitHub: https://github.com/apache/spark);
Used to process basically any kind of data (text files, Parquet, Avro,
databases, HDFS, S3, Object Storage, etc.);
Java, Python and Scala can be used as the programming language;
Takes advantage of RAM memory for fast processing.

Spark Role in the Cognitive World
Predictions
Natural Language Processing
Watson Integration
Cognitive Solutions Integrator
Cognitive Decisions in Real Time
with Watson ExplorerUnstructured Data Processing

Data Science Experience
datascience.ibm.com
IBM platform to run Spark code;
Uses Jupyter notebook;
Program in Python, Scala or R;
Uses Spark cluster from Bluemix;
2 Executors free service

How we use Spark
Environment:
◦ Developing and testing on Data Science Experience;
◦ Created our own standalone cluster with 7 workers for production running on
Softlayer;
◦ Created a auto-scaling standalone cluster using docker containers on Buemix;
Processing:
◦ Environment for fast clustering and testing new algorithms;
◦ Move structured and unstructured data from different databases;
◦ Data cleaning;
◦ To speed up ETL processes;

Resources
My article talking about Spark
◦ https://w3-connections.ibm.com/blogs/af5593c1-5dae-421e-87d6-
6ac263973790/entry/Spark_what_is_that?lang=en_us
My GitHub on how to create and run Spark Standalone using Docker containers on Bluemix
◦ https://github.com/brunocfnba/docker-spark-cluster
Big Data Analysis with Apache Spark Course (Free but has defined enrollment seasons)
◦ https://www.edx.org/course/big-data-analysis-apache-spark-uc-berkeleyx-cs110x
Apache Spark web site
◦ http://spark.apache.org/

What's hot

Airflow at WePayChris Riccomini

Building an analytics workflow using Apache AirflowYohei Onishi

Airflow presentationAnant Corporation

Building Better Data Pipelines using Apache AirflowSid Anand

Apache Airflow overviewNikolayGrishchenkov

Apache AirflowKnoldus Inc.

Apache Airflow in ProductionRobert Sanders

Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc

Workflow Engines for HadoopJoe Crobak

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying

Airflow introductionChandler Huang

Getting to Know AirflowRosanne Hoyem

Airflow presentationIlias Okacha

Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi

Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...Kaxil Naik

Building cloud-enabled genomics workflows with Luigi and DockerJacob Feala

AIRflow at ScaleDigital Vidya

Workflow Engines + LuigiVladislav Supalov

Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand

Apache Airflow at DailymotionGermain Tanguy

What's hot (20)

Airflow at WePay

Building an analytics workflow using Apache Airflow

Airflow presentation

Building Better Data Pipelines using Apache Airflow

Apache Airflow overview

Apache Airflow

Apache Airflow in Production

Running Airflow Workflows as ETL Processes on Hadoop

Workflow Engines for Hadoop

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management

Airflow introduction

Getting to Know Airflow

Airflow presentation

Building a Data Pipeline using Apache Airflow (on AWS / GCP)

Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...

Building cloud-enabled genomics workflows with Luigi and Docker

AIRflow at Scale

Workflow Engines + Luigi

Introduction to Apache Airflow - Data Day Seattle 2016

Apache Airflow at Dailymotion

Viewers also liked

Apache Spark ArchitectureAlexey Grishchenko

Introducing Apache Airflow and how we are using itBruno Faria

ClouderaAhmed Salman

Hw09 Clouderas Distribution For HadoopCloudera, Inc.

A short introduction to Spark and its benefitsJohan Picard

Apache hadoop and cdh(cloudera distribution) introduction 基本介紹Anna Yen

GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovJAXLondon2014

Apache Spark IntroductionRich Lee

Apache Spark An OverviewMohit Jain

Probabilistic programmingEli Gottlieb

Digital Data Tips Tuesday #1 - Tag Management: Simo Ahava - NetBoosterWebanalisten .nl

Google's Avinash Kaushik on Web AnalyticsLennart Svanberg

WorkshopBeth Kanter

DatomicJordan Leigh

Jim rohnMotivational Goldenwords

Tescoramyagolla

DatomicChristophe Marchal

Datomicjperkelens

Waldorf EducationxMerodi

Backbone.jsdaisuke shimizu

Viewers also liked (20)

Apache Spark Architecture

Introducing Apache Airflow and how we are using it

Cloudera

Hw09 Clouderas Distribution For Hadoop

A short introduction to Spark and its benefits

Apache hadoop and cdh(cloudera distribution) introduction 基本介紹

GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov

Apache Spark Introduction

Apache Spark An Overview

Probabilistic programming

Digital Data Tips Tuesday #1 - Tag Management: Simo Ahava - NetBooster

Google's Avinash Kaushik on Web Analytics

Workshop

Datomic

Jim rohn

Tesco

Datomic

Waldorf Education

Backbone.js

Similar to What is Spark

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn

Learn Apache Spark: A Comprehensive GuideWhizlabs

Using pySpark with Google Colab & Spark 3.0 previewMario Cartia

Apache sparkDona Mary Philip

Started with-apache-sparkHappiest Minds Technologies

Apache Spark OverviewDharmjit Singh

[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster ComputingRakuten Group, Inc.

Getting Started with Spark ScalaKnoldus Inc.

IOT.pptMvidhya9

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys

Spark introduction & Architecture.pptxMUMERSHARJEELCh

The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain

Pyspark vs Spark Let's Unravel the Bond!ankitbhandari32

Spark_Part 1Shashi Prakash

Spark 101 - First steps to distributed computingDemi Ben-Ari

Apache Arrow at DataEngConf Barcelona 2018Wes McKinney

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys

An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende

Spark Uber Development KitDataWorks Summit/Hadoop Summit

Similar to What is Spark (20)

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...

Learn Apache Spark: A Comprehensive Guide

Using pySpark with Google Colab & Spark 3.0 preview

Apache spark

Started with-apache-spark

Apache Spark Overview

[Rakuten TechConf2014] [C-6] Leveraging Spark for Cluster Computing

Getting Started with Spark Scala

IOT.ppt

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...

Spark introduction & Architecture.pptx

The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...

Pyspark vs Spark Let's Unravel the Bond!

Spark_Part 1

Spark 101 - First steps to distributed computing

Apache Arrow at DataEngConf Barcelona 2018

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...

An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark

Spark Uber Development Kit

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Install Stable Diffusion in windows machinePadma Pradeep

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

How to convert PDF to text with Nanonetsnaman860154

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

A Domino Admins Adventures (Engage 2024)Gabriella Davis

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Key Features Of Token Development (1).pptxLBM Solutions

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames

Install Stable Diffusion in windows machine

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Injustice - Developers Among Us (SciFiDevCon 2024)

How to convert PDF to text with Nanonets

Benefits Of Flutter Compared To Other Frameworks

How to Troubleshoot Apps for the Modern Connected Worker

A Domino Admins Adventures (Engage 2024)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Salesforce Community Group Quito, Salesforce 101

Key Features Of Token Development (1).pptx

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Handwritten Text Recognition for manuscripts and early printed texts

Pigging Solutions Piggable Sweeping Elbows

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Presentation on how to chat with PDF using ChatGPT code interpreter

What is Spark

1. Apache AN OVERVIEW

2. Agenda What is Spark? Spark Libraries and Architecture Spark role in the Cognitive world Introducing Data Science Experience How we are using Spark at Cognitive@IBM - Brazil

3. What is Spark? Spark is a framework, a set of APIs and a parallel engine; Created in AMPLab (Berkeley); Developed in Scala (GitHub: https://github.com/apache/spark); Used to process basically any kind of data (text files, Parquet, Avro, databases, HDFS, S3, Object Storage, etc.); Java, Python and Scala can be used as the programming language; Takes advantage of RAM memory for fast processing.

4. Libraries and Architecture

5. Libraries and Architecture

6. Spark Role in the Cognitive World Predictions Natural Language Processing Watson Integration Cognitive Solutions Integrator Cognitive Decisions in Real Time with Watson ExplorerUnstructured Data Processing

7. Data Science Experience datascience.ibm.com IBM platform to run Spark code; Uses Jupyter notebook; Program in Python, Scala or R; Uses Spark cluster from Bluemix; 2 Executors free service

8. How we use Spark Environment: ◦ Developing and testing on Data Science Experience; ◦ Created our own standalone cluster with 7 workers for production running on Softlayer; ◦ Created a auto-scaling standalone cluster using docker containers on Buemix; Processing: ◦ Environment for fast clustering and testing new algorithms; ◦ Move structured and unstructured data from different databases; ◦ Data cleaning; ◦ To speed up ETL processes;

9. Resources My article talking about Spark ◦ https://w3-connections.ibm.com/blogs/af5593c1-5dae-421e-87d6- 6ac263973790/entry/Spark_what_is_that?lang=en_us My GitHub on how to create and run Spark Standalone using Docker containers on Bluemix ◦ https://github.com/brunocfnba/docker-spark-cluster Big Data Analysis with Apache Spark Course (Free but has defined enrollment seasons) ◦ https://www.edx.org/course/big-data-analysis-apache-spark-uc-berkeleyx-cs110x Apache Spark web site ◦ http://spark.apache.org/

What is Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to What is Spark

Similar to What is Spark (20)

Recently uploaded

Recently uploaded (20)

What is Spark