DevNexus 2014, Data + Integration
Big Data Technology, Strategy, and Applications

Dr. Gail Zhou
Gail Z Associates, LLC
Fe...
Outline
•What is Big Data and why is it such a big deal? Where can we use
Big Data?
• Big Data Key Concepts and Technologi...
What is Big Data and
why is it such a big deal?

3

Gail Z Associates, LLC
A Brief History of Big Data
Sources: Wikipedia, Forbes.com, and other articles

• 1941: “Information Explosion” term coine...
A Brief History of Big Data Cont’d
Sources: Wikipedia, Forbes.com, and other articles

• 2001, Doug Laney, Meta Group, “3D...
A Brief History of Big Data Cont’d
Sources: Wikipedia, Forbes.com, and other articles

• 2001 - 2003: Google outgrown as a...
7

Gail Z Associates, LLC
Population Growth Chart: Does it have something to do with Big Data? Machines,
Satellites, Cameras, Internet, computers, a...
Source: Newbury College, UK

www.spchui.net

Information Explosion. It is just the real beginning.
You got mail (too much)...
Big Data Opportunities

10

Gail Z Associates, LLC
Big Data Opportunities
• Medical Research and Healthcare: Massive collected research and clinical
information can be used ...
Where Big Data Can Shine
• Traditional (Examples)
 Financial Transactions
 Energy and
Infrastructure
 Transportation
 ...
Key Concepts in Big Data –
Technology and Architectures

13

Gail Z Associates, LLC
14

Gail Z Associates, LLC
15

Gail Z Associates, LLC
Hadoop HDFS
Blocks (64M, 128M, etc.) are saved in different nodes with a replication factor ( default 3)

16

Gail Z Assoc...
Hadoop Logical View

http://nosqlessentials.com
Professor: Fernando Rodriguez Olivera
17

Gail Z Associates, LLC
Hadoop Logical View (HDFS + Map Reduce)

18

Gail Z Associates, LLC
Hadoop V1 – Map Reduce Jobs Execution

19

Gail Z Associates, LLC
Hadoop 2.0 with YARN

Gail Z Associates, LLC
YARN Interaction & Sequence

Gail Z Associates, LLC
Big Data Challenges,
Suggested Startup Strategy

22

Gail Z Associates, LLC
Big Data Start up Challenges
 Business urgency, time to market pressures
 Big Data start up needs careful planning
 Big...
Suggested Big Data Start up Strategy
 Full business needs and information requirements analysis. Business Drivers
 Reven...
Appendix

25

Gail Z Associates, LLC
Hadoop & Cassandra Based Offerings
Name

Offerings

Notes

Apache Hadoop

Hadoop Core

Enhancement: YARN

Cloudera

Enhanc...
Hadoop Related Technologies (Examples)
Name

Functions

Notes

Apache Hue

Hadoop GUI

Hadoop has cmd.

Apache HBase

NoSQ...
Cassandra

http://nosqlessentials.com
Professor: Fernando Rodriguez Olivera
Gail Z Associates, LLC
HBase

http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/
Gail Z Associates, LLC
http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/

Gail Z Associates, LLC
Upcoming SlideShare
Loading in...5
×

Gail Zhou on "Big Data Technology, Strategy, and Applications"

676

Published on

Dr. Gail Zhou presented this topic at DevNexus on Feb 25, 2014. Big Data history, opportunities, and applications. Big Data key concepts, reference architecture with open source technology stacks. Hadoop architecture explained (HDFS, Map Reduce, and YARN). Big Data start-up challenges and strategies to overcome them. Technology update: Hadoop and Cassandra based technology offerings.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
676
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
22
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Gail Zhou on "Big Data Technology, Strategy, and Applications"

  1. 1. DevNexus 2014, Data + Integration Big Data Technology, Strategy, and Applications Dr. Gail Zhou Gail Z Associates, LLC February 25, 2014 LinkedIn: http://www.linkedin.com/in/gailZhou Email: gail.r.zhou@gmail.com Gail Z Associates, LLC
  2. 2. Outline •What is Big Data and why is it such a big deal? Where can we use Big Data? • Big Data Key Concepts and Technologies using Hadoop as an example •Big Data Challenges and Start up Strategy: What are the challenges? How do you get started on Big Data? Appendix: Other Big Data Technologies, Integration of Big Data with Existing Applications (an example) 2 Gail Z Associates, LLC
  3. 3. What is Big Data and why is it such a big deal? 3 Gail Z Associates, LLC
  4. 4. A Brief History of Big Data Sources: Wikipedia, Forbes.com, and other articles • 1941: “Information Explosion” term coined. • 1963: Physicist and science historian Derek Price concluded the number of new journals grown exponentially. • 1990: Computer Scientist Peter J. Denning, “Saving All the Bits”, what machines can we build to monitor, process, and understand the data, its meanings, and patterns? – Intelligence out of the data? • 1998: Steve Bryson et all, “Visually exploring gigabyte data sets in real time”, ACM, Section “Big Data for Scientific Visualization”. 4 Gail Z Associates, LLC
  5. 5. A Brief History of Big Data Cont’d Sources: Wikipedia, Forbes.com, and other articles • 2001, Doug Laney, Meta Group, “3D Data Management, Controlling Data Volume, Velocity, and Variety” (More now: Veracity, Variability, and Value) 5 Gail Z Associates, LLC
  6. 6. A Brief History of Big Data Cont’d Sources: Wikipedia, Forbes.com, and other articles • 2001 - 2003: Google outgrown as a result of new revenue model, 5 cents per click. Google is now a giant big data leader. • 1994 – Present: Yahoo!, Hadoop Shop (10K Nodes), Genome, Big Data Analytics. • 1994 – Present: Amazon, AWS Cloud. • 2003 – Present: Facebook, Twitter, LinkedIn, etc. • 2013 and beyond : Many others. 6 Gail Z Associates, LLC
  7. 7. 7 Gail Z Associates, LLC
  8. 8. Population Growth Chart: Does it have something to do with Big Data? Machines, Satellites, Cameras, Internet, computers, and mobile phones are just “enablers” of big data. Source: Global Education Project 8 Gail Z Associates, LLC
  9. 9. Source: Newbury College, UK www.spchui.net Information Explosion. It is just the real beginning. You got mail (too much). You are embarrassed to admit you don’t know a lot of cool things happening in the world. www.ucg.org Don’t despair. You are not alone. 9 Gail Z Associates, LLC
  10. 10. Big Data Opportunities 10 Gail Z Associates, LLC
  11. 11. Big Data Opportunities • Medical Research and Healthcare: Massive collected research and clinical information can be used to predict and prevent diseases, moving us from ‘sick care’ to ‘health care’. • Telecom: Traffic data and patterns can be utilized in real time to re-route. • Defense: Satellite images and other information can be meshed up to identify threats. • Utilities: Smart meter monitoring. • Public Safety: Pattern recognition and social media can help to predict crimes. • Financial Industry: Patten recognition and business rules to flag fraudulent activities. • Functional Areas: Investigational Search, Pricing Optimization, Risk Analysis, Churn Analysis, Behavior Analysis, Transactions Analysis, Revenue Assurance, Recommendation Engines, etc. 11 Gail Z Associates, LLC
  12. 12. Where Big Data Can Shine • Traditional (Examples)  Financial Transactions  Energy and Infrastructure  Transportation  Life Science and HealthCare •Big Data (Examples) Advertisements Search and Indexing Social Networks Science Research Communications • Notes – Big Data Technology is not the replacement – Big Data is complementary – In some cases, Big Data is the only way to get things done – Big Data has its own challenges 12 Gail Z Associates, LLC
  13. 13. Key Concepts in Big Data – Technology and Architectures 13 Gail Z Associates, LLC
  14. 14. 14 Gail Z Associates, LLC
  15. 15. 15 Gail Z Associates, LLC
  16. 16. Hadoop HDFS Blocks (64M, 128M, etc.) are saved in different nodes with a replication factor ( default 3) 16 Gail Z Associates, LLC
  17. 17. Hadoop Logical View http://nosqlessentials.com Professor: Fernando Rodriguez Olivera 17 Gail Z Associates, LLC
  18. 18. Hadoop Logical View (HDFS + Map Reduce) 18 Gail Z Associates, LLC
  19. 19. Hadoop V1 – Map Reduce Jobs Execution 19 Gail Z Associates, LLC
  20. 20. Hadoop 2.0 with YARN Gail Z Associates, LLC
  21. 21. YARN Interaction & Sequence Gail Z Associates, LLC
  22. 22. Big Data Challenges, Suggested Startup Strategy 22 Gail Z Associates, LLC
  23. 23. Big Data Start up Challenges  Business urgency, time to market pressures  Big Data start up needs careful planning  Big Data needs infrastructure, software stacks, people, start up plan  Lack of Big Data Resources, Lack of Sponsorships (except in some companies)  Big Data is complex and multiple skill sets (mostly new to many companies) – Infrastructure, Administration, Security, Programming, Testing, etc.  Skepticism about Big Data  Integration with Existing Technologies and Systems  Can not develop isolated big data solutions  Integration with existing systems will be a top challenge (requires both sides to do additional work)  Open Sources: Stability, Maturity, and Security Gail Z Associates, LLC
  24. 24. Suggested Big Data Start up Strategy  Full business needs and information requirements analysis. Business Drivers  Revenue generation? Cost reduction? Customer retention? Compliance?  Process Improvement? Fraud detection? Analytics? Dashboard?  Solving a tough problem? Retiring/replacing technologies and systems?  Technology Evaluation and Selection  Define requirements and objective first  Evaluation a variety of technology stacks – develop a framework first  Executive Support for Start up Resources  Prototyping, Discovery, and Planning  Rent Infrastructure in Cloud – VMWare, Amazon EC2, and others  Use Spare Hardware and Network Bandwidth  Assessment, Proposal. Project/Program Plan for next steps  Start small and keep delivering  Architecture Design, Estimation, Business Case  Obtain funding and executive sponsorships, owners, etc.  SDLC, don’t forget Hardware, Security, Testing, etc. Gail Z Associates, LLC
  25. 25. Appendix 25 Gail Z Associates, LLC
  26. 26. Hadoop & Cassandra Based Offerings Name Offerings Notes Apache Hadoop Hadoop Core Enhancement: YARN Cloudera Enhanced Hadoop Leader DataStax Enhanced Apache Cassandra Cassandra is a distributed NoSQL DB Hortonworks Hadoop Development and support. Hortonworks Data Platform (HDP) Yahoo Funded $23M + Others . Major alliances. MapR Develops and sells Hadoop-derived software. M3. M5, M7. Alliance with EMC, Amazon, and Google. Sqoop HDFS and SQL Integration Hue Hadoop GUI Tools Amazon AWS, Cloud Hadoop Cluster Microsoft Windows Azure HDInsight IBM, Dell, etc. Hardware, Software, Services Gail Z Associates, LLC
  27. 27. Hadoop Related Technologies (Examples) Name Functions Notes Apache Hue Hadoop GUI Hadoop has cmd. Apache HBase NoSQL Distributed DB, Key/value Column Family Store, runs on top of Hadoop Big Table Like Storage for Hadoop, written in Java. Apache PIG High Level programming language for Map Reduce Pig Latin, interoperability with Python, JavaScript, Ruby and Groovy Apache HIVE Data Warehouse on top of Hadoop. HiveQL Summaries, queries, and analysis. Open Sourced by Facebook. Apache Zoo Keeper Hadoop Configuration / Build Tools Distributed configuration, synchronization, etc) Apache Sqoop Move RDBMS data into Hadoop Command lines Gail Z Associates, LLC
  28. 28. Cassandra http://nosqlessentials.com Professor: Fernando Rodriguez Olivera Gail Z Associates, LLC
  29. 29. HBase http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/ Gail Z Associates, LLC
  30. 30. http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/ Gail Z Associates, LLC
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×