Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Let’s talk about it
董⻜飞@Berkeley
Feb.08 2015
What do we talk about, when we talk about Silicon Valley, Start-up and Big Da...
Agenda
• Who am I?
• The Billboards in Silicon Valley
• Why Big Data in the Spotlight?
• What is Industry Practice in Big ...
3
Who am I?
董⻜飞
Coursera 软件⼯工程师
Zhihu 热⻔门帖作者
• 在 Coursera ⼯工作是怎样⼀一番体验?

• Linkedin 印象是什么?

• Coursera 上有哪些课程值得推荐?

• 国内有哪些...
Agenda
• Who am I?
• The Billboards in Silicon Valley
• Why Big Data in the Spotlight?
• What is Industry Practice in Big ...
The Billboards in Silicon Valley
Mature Company Public Company Pre IPO Start-up
Microsoft
IBM
Intel
Oracle
…
Google
Facebo...
The Billboards in Silicon Valley
Mobile • Big Data • Consumer Internet • Health • e-Business • Payment • O2O APP
2012
2013
2014
The Billboards in Silicon Valley
Gartner's 2014 Hype Cycle for Emerging Technologies Maps the Journey to Digital Business
The Billboards in Silicon Valley
Future Trends
未来机器将会控制 98% 的⼈人,只有⼀一个选
择,要么做这 2% 的⼈人,要么做这 98%
未来五年物联⺴⽹网带来庞⼤大的创业机会图灵测试
Agenda
• Who am I?
• The Billboards in Silicon Valley
• Why Big Data in the Spotlight?
• What is Industry Practice in Big ...
What is Big Data?
• 指数增⻓长
• ⼯工业⾰革命
• 摩尔定律
• 信息爆炸
• 奇点临近
What is Big Data?
• 指数增⻓长
• ⼯工业⾰革命
• 摩尔定律
• 信息爆炸
• 奇点临近
技术将会逼近⼈人类历史上的某种本质
的奇点,在那之后全部⼈人类⾏行为都不
可能以我们熟悉的⾯面貌继续存在。
“
”
Applied Areas
找⼯工作
移动APP
电⼦子商务 在线教育
互联⺴⽹网⾦金融数字化医疗
Numbers Everybody should Know
* Jeff Dean. Designs, lessons and advice from building large distributed systems. Keynote fr...
Big Data in China
Box Computing, Baidu Brain, Baidu Eye, IDL
双⼗十⼀一,阿⾥里云,数据魔⽅方,OceanBase
微信连接⼀一切
“两年后⼩小⽶米⼤大数据没价值就破产” —— 雷军
Big Data Landscape
http://thinkbig.teradata.com/leading_big_data_technologies/big-data-reference-architecture/
Database
Relational DB NoSQL
Where is Hadoop Started?
Google Big Data Papers
• MapReduce: Simplified Data
Processing on Large Clusters
• The Google Fil...
Why Hadoop ?
Scalability
Simply scales just by adding
nodes
Local processing to avoid
network bottlenecks
Efficiency
Cost ...
Why Ecosystem?
We started with
• HDFS
• MapReduce
Now we have
• I/O
• Processing
• Applications
• Configuration
• Workflow
MapReduce
• Programming paradigm

• Batch oriented, not realtime

• Based on Google’s paper
Problem with MR
• Very low-level: requires a lot of
code to do simple things
• Very constrained: everything must
be descri...
Spark Intro
AMP Lab
Startup
100x faster than Hadoop?
Berkeley Data Analytics Stack
Why Spark?
● Flexible like MapReduce
● High performance
● Machine learning, iterative algorithms
● Interactive data explor...
Spark on MOOC
LinkedIn Practice
Hadoop
Uses:
- Ad hoc
- Production Batch

Ecosystem:
- Hive, Pig
- Azkaban (workflow)
- Avro data
- Data...
Agenda
• Who am I?
• The Billboards in Silicon Valley
• Why Big Data in the Spotlight?
• What is Industry Practice in Big ...
● Position

● Salary

● Stock Options

● Vacation

● Employment at will
Offer Letter
How to Start?
● Foundation
o Classic Books: “Advanced Programming in the UNIX Environment”,
“Introduction to Algorithms”, ...
• Coding
• bug free Strstr(), Memmove()
• Algo/Data Structure
• Binary Search, Hashtable, Heap, Trie
• System Design
• Dat...
• Find position
• Refer, LinkedIn, HeadCounter
• Resume polish, perfect match
• Initial touch with recruiter
• Phone scree...
How to Choose Offer?
• Company Review
– Rank: Glassdoor
– Financing:Crunchbase
– Traffic: Alexa,comScore
– Talent: LinkedI...
Reference
Introduction to Database Systems
Hadoop The Definitive Guide 3rd edition
Hadoop in Action
How to Start a Startup...
Contacts
董⻜飞
Coursera 软件⼯工程师
dongfeiwww@gmail.com
www.linkedin.com/in/dongfei
www.zhihu.com/people/dongfei
www.facebook.co...
Thank You
Upcoming SlideShare
Loading in …5
×

Berkeley Lecture

1,984 views

Published on

Silicon Vally Startup, Intro to Big Data, Job Hunting Skills

Published in: Engineering

Berkeley Lecture

  1. 1. Let’s talk about it 董⻜飞@Berkeley Feb.08 2015 What do we talk about, when we talk about Silicon Valley, Start-up and Big Data
  2. 2. Agenda • Who am I? • The Billboards in Silicon Valley • Why Big Data in the Spotlight? • What is Industry Practice in Big Data? • How to get your first job? • Reference
  3. 3. 3 Who am I? 董⻜飞 Coursera 软件⼯工程师 Zhihu 热⻔门帖作者 • 在 Coursera ⼯工作是怎样⼀一番体验? • Linkedin 印象是什么? • Coursera 上有哪些课程值得推荐? • 国内有哪些有硅⾕谷范的创业公司让⼈人印象深刻? • 2015 年硅⾕谷最⽕火的⾼高科技创业公司都有哪些? • 如何评价现阶段的在线教育⾏行业?
  4. 4. Agenda • Who am I? • The Billboards in Silicon Valley • Why Big Data in the Spotlight? • What is Industry Practice in Big Data? • How to get your first job? • Reference
  5. 5. The Billboards in Silicon Valley Mature Company Public Company Pre IPO Start-up Microsoft IBM Intel Oracle … Google Facebook LinkedIn Twitter … Uber Palantir Cloudera Square … Quora Houzz Coursera Quixey … over 100, 000 employees 5, 000 ~ 50, 000 employees 1, 000 ~ 5, 000 employees 100 ~ 500 employees
  6. 6. The Billboards in Silicon Valley Mobile • Big Data • Consumer Internet • Health • e-Business • Payment • O2O APP
  7. 7. 2012 2013 2014 The Billboards in Silicon Valley
  8. 8. Gartner's 2014 Hype Cycle for Emerging Technologies Maps the Journey to Digital Business The Billboards in Silicon Valley
  9. 9. Future Trends 未来机器将会控制 98% 的⼈人,只有⼀一个选 择,要么做这 2% 的⼈人,要么做这 98% 未来五年物联⺴⽹网带来庞⼤大的创业机会图灵测试
  10. 10. Agenda • Who am I? • The Billboards in Silicon Valley • Why Big Data in the Spotlight? • What is Industry Practice in Big Data? • How to get your first job? • Reference
  11. 11. What is Big Data? • 指数增⻓长 • ⼯工业⾰革命 • 摩尔定律 • 信息爆炸 • 奇点临近
  12. 12. What is Big Data? • 指数增⻓长 • ⼯工业⾰革命 • 摩尔定律 • 信息爆炸 • 奇点临近 技术将会逼近⼈人类历史上的某种本质 的奇点,在那之后全部⼈人类⾏行为都不 可能以我们熟悉的⾯面貌继续存在。 “ ”
  13. 13. Applied Areas 找⼯工作 移动APP 电⼦子商务 在线教育 互联⺴⽹网⾦金融数字化医疗
  14. 14. Numbers Everybody should Know * Jeff Dean. Designs, lessons and advice from building large distributed systems. Keynote from LADIS, 2009.
  15. 15. Big Data in China Box Computing, Baidu Brain, Baidu Eye, IDL 双⼗十⼀一,阿⾥里云,数据魔⽅方,OceanBase 微信连接⼀一切 “两年后⼩小⽶米⼤大数据没价值就破产” —— 雷军
  16. 16. Big Data Landscape http://thinkbig.teradata.com/leading_big_data_technologies/big-data-reference-architecture/
  17. 17. Database Relational DB NoSQL
  18. 18. Where is Hadoop Started? Google Big Data Papers • MapReduce: Simplified Data Processing on Large Clusters • The Google File System • BigTable: A Distributed Storage System for Structured Data
  19. 19. Why Hadoop ? Scalability Simply scales just by adding nodes Local processing to avoid network bottlenecks Efficiency Cost efficiency (<$1k/TB) on commodity hardware Unified storage, metadata, security (no duplication or synchronization) Flexibility All kinds of data (blobs, documents, records, etc) In all forms (structured, semi- structured, unstructured) Store anything then later analyze what you need
  20. 20. Why Ecosystem? We started with • HDFS • MapReduce Now we have • I/O • Processing • Applications • Configuration • Workflow
  21. 21. MapReduce • Programming paradigm
 • Batch oriented, not realtime
 • Based on Google’s paper
  22. 22. Problem with MR • Very low-level: requires a lot of code to do simple things • Very constrained: everything must be described as “map” and “reduce”. Powerful but sometimes difficult to think in these terms.
  23. 23. Spark Intro AMP Lab Startup 100x faster than Hadoop?
  24. 24. Berkeley Data Analytics Stack
  25. 25. Why Spark? ● Flexible like MapReduce ● High performance ● Machine learning, iterative algorithms ● Interactive data explorations World record for sorting https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
  26. 26. Spark on MOOC
  27. 27. LinkedIn Practice Hadoop Uses: - Ad hoc - Production Batch
 Ecosystem: - Hive, Pig - Azkaban (workflow) - Avro data - Data in: Kafka, Data out: Voldermort, Kafka
  28. 28. Agenda • Who am I? • The Billboards in Silicon Valley • Why Big Data in the Spotlight? • What is Industry Practice in Big Data? • How to get your first job? • Reference
  29. 29. ● Position
 ● Salary
 ● Stock Options
 ● Vacation
 ● Employment at will Offer Letter
  30. 30. How to Start? ● Foundation o Classic Books: “Advanced Programming in the UNIX Environment”, “Introduction to Algorithms”, “Computer Systems: A Programmer's Perspective”, “Java Design Pattern”, “JavaScript the Definitive Guide”
 ● Target o Data Science o Application o Infrastructure
 ● Practice https://www.google.com/edu/tools-and-solutions/guide-for-technical-development/ o API, debug o Find Pattern o Real Problem, Intuition
  31. 31. • Coding • bug free Strstr(), Memmove() • Algo/Data Structure • Binary Search, Hashtable, Heap, Trie • System Design • Database, Network, Distributed System • Misc. • Linux, SQL, Probability • Behavior Questions • Culture, Passion, Love Product How to Prepare Tech Interview?
  32. 32. • Find position • Refer, LinkedIn, HeadCounter • Resume polish, perfect match • Initial touch with recruiter • Phone screen, whiteboard Test • Onsite: 4~7 rounds How to get Interview Opportunity?
  33. 33. How to Choose Offer? • Company Review – Rank: Glassdoor – Financing:Crunchbase – Traffic: Alexa,comScore – Talent: LinkedIn
 • Public, Pre IPO, Tech-Driven, Enterprise, Mobile
 • Compensation/Benefit – Base, Bonus, Stock/Option
 • Interest, Growth Platform, Pressure
 • 没有绝对正确的选择,只有你付出把它实现才有意义!
  34. 34. Reference Introduction to Database Systems Hadoop The Definitive Guide 3rd edition Hadoop in Action How to Start a Startup?
 http://startupclass.samaltman.com/ The AI Revolution: The Road to Superintelligence http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html 互联⺴⽹网公司最常⻅见的⾯面试算法题有哪些? http://www.zhihu.com/question/24964987/answer/29611950
  35. 35. Contacts 董⻜飞 Coursera 软件⼯工程师 dongfeiwww@gmail.com www.linkedin.com/in/dongfei www.zhihu.com/people/dongfei www.facebook.com/donglaoshi123 LinkedIn Zhihu 董⽼老师说码⼯工那点⼉儿事
  36. 36. Thank You

×