Radiant Call girls in Dubai O56338O268 Dubai Call girls
Strata singapore survey
1. pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
Yes 28 27.7%
No 73 72.3%
Core Spark 70 69.3%
Spark SQL + DataFrames 78 77.2%
Spark Streaming 66 65.3%
MLlib (machine learning) 72 71.3%
GraphX 30 29.7%
Zero Knowledge 44 43.6%
Beginner 47 46.5%
Medium 9 8.9%
Expert 1 1%
101 responses
Summary
Have you edited Wikipedia articles before?
Which of the following Spark components are you mostly interested in using after class?
Scala [Which programming language API of Spark are you most comfortable in?]
Java [Which programming language API of Spark are you most comfortable in?]
72.3%
27.7%
0 15 30 45 60 75
Core Spark
Spark SQL +…
Spark Stream…
MLlib (machin…
GraphX
0 10 20 30 40
Zero Knowled…
Beginner
Medium
Expert
SIGN IN
The version of the browser you are using is no longer supported. Please upgrade to a supported browser. Dismiss
2. pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
Zero Knowledge 18 17.8%
Beginner 21 20.8%
Medium 41 40.6%
Expert 21 20.8%
Zero Knowledge 30 29.7%
Beginner 31 30.7%
Medium 27 26.7%
Expert 13 12.9%
Zero Knowledge 6 5.9%
Beginner 16 15.8%
Medium 54 53.5%
Expert 25 24.8%
Zero Knowledge 33 32.7%
Beginner 43 42.6%
Medium 19 18.8%
Expert 6 5.9%
Development (how to write Spark apps, API coverage, debugging) 86 85.1%
Python [Which programming language API of Spark are you most comfortable in?]
SQL [Which programming language API of Spark are you most comfortable in?]
R [Which programming language API of Spark are you most comfortable in?]
I would like the focus of the class to be:
0 10 20 30 40
Zero Knowled…
Beginner
Medium
Expert
0.0 7.5 15.0 22.5 30.0
Zero Knowled…
Beginner
Medium
Expert
0 10 20 30 40 50
Zero Knowled…
Beginner
Medium
Expert
0 10 20 30 40
Zero Knowled…
Beginner
Medium
Expert
3. pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
Administration / Ops (how Spark scales, configuration parameters, tuning) 39 38.6%
Architecture (how the JVMs interact with each other, Spark Standalone, YARN integration, etc) 62 61.4%
Use Cases (non-technical section on how companies are using Spark) 55 54.5%
Level 0: I am a totally new to Spark 50 49.5%
Level 1: I have launched the Spark shell and executed a few transformations & actions and looked at the Spark UIs 32 31.7%
Level 2: I have either written 100+ lines of code for a Spark application or I understand the following: what narrow vs wide dependencies are, how to figure out which transformations cause a shuffle 15 14.9%
Level 3: I have been using Spark greater than 50% of the time in my job for over 2 months in either a development or administration role 4 4%
Level 4: I have contributed at least 20 lines of code to the Apache Spark project 0 0%
Class day will be my first hands-on exposure to programming in Spark 52 51.5%
I have been playing with the Spark shells for less than a week 26 25.7%
I have under 1 month of experience with Spark 7 6.9%
I have 1 - 6 months of experience with Spark 13 12.9%
6 - 12 months 2 2%
1+ year 1 1%
How experienced are you with Spark?
For how long have you been doing hands-on Development or Operations work with Apache Spark?
Where are you in the Spark usage lifecycle?
0 20 40 60 80
Development…
Administratio…
Architecture (…
Use Cases (no…
14.9%
31.7%
49.5%
25.7%
51.5%
4. pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
Just starting to learn about Spark, reading about it... 61 60.4%
I have a small 1-node Spark cluster or VM that I'm playing around with 15 14.9%
I am currently building a Proof of Concept or Prototype to demonstrate a use case 21 20.8%
We are in production! 4 4%
Please select which of the following Big Data technologies you have at least "medium" level technical proficiency in:
20.8%
14.9%
60.4%
5. pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
HDFS 64 63.4%
MapReduce 50 49.5%
YARN 31 30.7%
Mesos 2 2%
Cascading 7 6.9%
Kafka 19 18.8%
Storm 9 8.9%
Flume 12 11.9%
HBase 20 19.8%
Cassandra 11 10.9%
Hive 42 41.6%
Impala 13 12.9%
Pig 24 23.8%
Parquet 16 15.8%
ZooKeeper 20 19.8%
MongoDB 26 25.7%
Couchbase 4 4%
Neo4j 5 5%
Titan 0 0%
Oozie 16 15.8%
Sqoop 17 16.8%
Giraph or Graphlab 2 2%
Accumulo 0 0%
Phoenix 3 3%
Tez 7 6.9%
ElasticSearch 15 14.9%
Lucene / Solr 19 18.8%
Math: Statistics, Linear Algebra, Calculus, Matrix math, etc 46 45.5%
0 15 30 45 60
HDFS
MapReduce
YARN
Mesos
Cascading
Kafka
Storm
Flume
HBase
Cassandra
Hive
Impala
Pig
Parquet
ZooKeeper
MongoDB
Couchbase
Neo4j
Titan
Oozie
Sqoop
Giraph or…
Accumulo
Phoenix
Tez
ElasticSe…
Lucene /…
Math: Sta…
6. pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
with Databricks Cloud 6 5.9%
with Hadoop (YARN/HDFS) 55 54.5%
with Cassandra (Standalone mode) 10 9.9%
with pure Apache Spark (Standalone mode) 21 20.8%
with Mesos 6 5.9%
I don't know yet 37 36.6%
Within Amazon Cloud 31 30.7%
On-premise within our private data center 73 72.3%
A different cloud provider 17 16.8%
AmpCamp training at UC Berkeley (Academic) 0 0%
SparkCamp training from Databricks (Industry) 3 3%
Cloudera Spark training 2 2%
Another vendor's Spark training 2 2%
Spark Summit conference 2 2%
None of the above 94 93.1%
How are you planning on deploying Spark within your organization?
Where do you plan on deploying Spark clusters for your organization?
Which of the following Spark training sessions, if any, have you attended before?
Which industry do you work in?
0 10 20 30 40 50
w ith Databric…
w ith Hadoop…
w ith Cassan…
w ith pure Ap…
w ith Mesos
I don't know yet
Within Amazo…
On-premise w…
A different cl…
0 20 40 60 80
AmpCamp tra…
SparkCamp tr…
Cloudera Spa…
Another ven…
Spark Summit…
None of the a…
7. pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
IT / Systems / Solution Provider / IT Consultancy 53 52.5%
Banking / Finance 17 16.8%
Science & Technology 8 7.9%
Academia / University / Education 2 2%
Advertising / Marketing / PR 3 3%
Telecommunications 8 7.9%
Healthcare / Medical / Pharmaceuticals 5 5%
Publishing / Media 4 4%
Retailer / Distributor / Wholesale 2 2%
Government 6 5.9%
Insurance 0 0%
Legal 2 2%
Manufacturing / Design 6 5.9%
Nonprofit 2 2%
Business Services Consulting (Non-IT) 3 3%
Other 6 5.9%
Developer / Software Engineer / Software Architect 56 55.4%
Administrator / Operations / DevOps 8 7.9%
Data Scientist / Statistics / Machine Learning 40 39.6%
Management / Executive 8 7.9%
Sales / Marketing 3 3%
Other 5 5%
Which of the following job categories best describes your role at your company?
How far did you travel from to attend this class?
0 10 20 30 40 50
IT / System…
Banking / Fi…
Science &…
Academia /…
Advertising…
Telecomm…
Healthcare…
Publishing…
Retailer / Di…
Government
Insurance
Legal
Manufacturi…
Nonprofit
Business S…
Other
0 10 20 30 40 50
Developer / S…
Administrator…
Data Scientis…
Management…
Sales / Marke…
Other
8. pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
Singapore: I live in Singapore already 54 53.5%
USA: I live in the Western half of the United States (like San Francisco, Seattle, Denver, Portland) 0 0%
USA: I live in the Eastern half of United States (NYC, D.C., Atlanta, etc) 1 1%
INTERNATIONAL: I flew in from an Asian country like Japan, China, India, South Korea, etc 35 34.7%
INTERNATIONAL: I am coming from a European country 4 4%
INTERNATIONAL: Other 7 6.9%
OPTIONAL: Finally, please freely describe your experience with Spark so far.
RDD vs DataFrames; which one to focus on
I am beginer. We are exploring Apache spark to implement some of the use cases in our organization.
Fast & simpler than MR
Interesting
huge amount of data
class loader problems. :(
Australia
we are going to implement Spark in our current project
I have been exploring spark mostly from Hadoop Data Processing
OPTIONAL: Is there anything you want to communicate to the instructor?
Want to hear more on Real-Time Architecture
I have heard about the limitations of dataframes of 22 columns due to the tupes limitations. How do you overcome this?
Thank you
Slow if it is possible as I am new to Spark
Could you talk about the trade-offs between developing in RDDs vs Dataframes? Data frames are great and reduce development time, but are RDDs significantly faster? Also, could you also
talk about the trade-offs between developing in Scala vs Python? Python is more easily maintainable but lags behind Scala in terms of Spark release.
Nothing , as of now
Number of daily responses
34.7%
53.5%
0
30
60
90
120