Like this? Share it with your network

Share

Big Data, Hadoop, NoSQL and more ...

on

  • 4,621 views

I gave a series of Seminars at the following colleges in Solapur. ...

I gave a series of Seminars at the following colleges in Solapur.

1. Walchand Institute of Technology, Solapur.
2. Brahmdevdada Mane Institute of Technology, Solapur.
3. Orchid College of Engineering & Technology, Solapur.
4. SVERI's College of Engineering, Pandharpur.

It focussed on what 'BigData' is and how the next generation of professionals should be ready the BigData revolution

Statistics

Views

Total Views
4,621
Views on SlideShare
4,418
Embed Views
203

Actions

Likes
7
Downloads
77
Comments
0

8 Embeds 203

http://vrdthoughts.com 94
http://www.scoop.it 52
http://www.linkedin.com 42
https://twitter.com 9
http://vrdthoughts.com. 3
http://www.slashdocs.com 1
https://www.linkedin.com 1
http://ngoding.co 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Complete till this in 8 mins. You have 25 minutes left.

Big Data, Hadoop, NoSQL and more ... Presentation Transcript

  • 1. © Orzota, Inc. 2013
  • 2. Big Data, Hadoop, NoSQL and more … Varad Meru Software Development Engineer, Orzota, Inc. varad@orzota.com in.linkedin.com/in/vmeru @vrdmr © Orzota, Inc. 2013 2
  • 3. About Orzota Mission: Make big data easy for consumption  Offers Big Data/Hadoop Solutions and Software Services to companies  Develops Software to help companies consume Big Data Founded in March 2012 Headquartered in Silicon Valley, California Offshore offices in Chennai, India © Orzota, Inc. 2013 3
  • 4. About Orzota (contd.) We work on o Big Data o Hadoop o Cloud Technologies o Data Science o Products and Services o Everything that it takes to be a valued Player. © Orzota, Inc. 2013 4
  • 5. About Orzota (contd.) Community Development  Occasional seminars by Architects, Engineers, Managers.  We invite professionals and aspiring professionals to join Big Data / Hadoop communities in their geographies.  Pune Hadoop User Group – Participant + Organizer.  Chennai Hadoop User Group – Participant + Sponsor. © Orzota, Inc. 2013 5
  • 6. About Me• Orzota, Inc. • Currently working with Hadoop, Mahout, Cloud, etc.• Past Work Experience • Persistent Systems – Search, Recommendation Engines and User Behavior Analytics.• Area of Interest • Data Science, Information Retrieval • Distributed Systems © Orzota, Inc. 2013 6
  • 7. Some of the Innovation Centers in Technological World © Orzota, Inc. 2013 7
  • 8. Agenda• Introduction to BigData • Technologies and Domain• Hadoop EcoSystem • Introduction to MapReduce • Architecture – HDFS + MapReduce.• NoSQL Databases • CAP Theorem • Different NoSQL Databases• Other Trends © Orzota, Inc. 2013 8
  • 9. Big Data© Orzota, Inc. 2013 9
  • 10. Big Data• What is Big Data?• What does it mean to me?• Why so much fuss in the industry?• Who uses these technologies?• How are they used in the Industry and Academia?• When to start using them?• How to learn them? © Orzota, Inc. 2013 10
  • 11. Big Data – 3 Vs• Volume - Amassing terabytes—even petabytes—of information. • 12 terabytes of Tweets created each day. • 350 billion annual meter readings.• Velocity - Sometimes 2 minutes is too late. • Scrutinize 5 million trade events. • 500 million daily call detail records• Variety - Big data is any type of data. • 80% data growth in images, video and documents.“Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” – Laney Douglas. "The Importance of Big Data: A Definition" © Orzota, Inc. 2013 11
  • 12. Problem• Store and Process Data for - • Search Engines, • Recommendations Engines, • Fraud Detection, • Aadhar (Govt. of India), • Spam Detection, etc.• Also, in some cases Real-time (e.g. Facebook) © Orzota, Inc. 2013 12
  • 13. Solutions ?• Classical Solutions • Database + Programming Language (Java-Oracle, C#- SQL Server) • Data Warehouses – Teradata, Netezza, Microsoft PDW • Legacy Network Systems • Novel • CORBA • Java RMI – RPC © Orzota, Inc. 2013 13
  • 14. Problems of the Solutions• Problems with Classical Solutions • CAP Theorem, by Prof. Eric Brewer (Berkeley) – • Choose any 2 between Consistency, Availability and Partitioning • ACID Properties • For Small number of Transactions, cumulative overhead still manageable. • For Very large number of Transactions – Facebook Posts? • Very High Licensing Fees. • Closed Source – Stick with the Company’s Eco-System. © Orzota, Inc. 2013 14
  • 15. Solution to the Problems of the Solutions• Focus on Problem Domain • What’s more important for your Solution? • Consistency, Availability, and Partitioning • Which Industry/Company already face similar Problems? • How/Where to Collect Data?• Technology Fields – Internet Companies • Hadoop, NoSQL Datastores• Open Source, Free and with Friendly Licenses. © Orzota, Inc. 2013 15
  • 16. Hadoop Eco-System 16 © Orzota, Inc. 2013
  • 17. Introduction• Started by Doug Cutting and Mike Caferella for Nutch – Open Search Engine.• Further Developed at Yahoo!, Facebook and contributed by people from many companies.• Named after a Little Toy Elephant owned by Doug’s Son.• Inspired by 2 research papers from Google • The Google File System – 2003 • MapReduce – 2004 © Orzota, Inc. 2013 17
  • 18. Introduction (contd.)• Contains 3 modules • Distributed File System • MapReduce • Commons (A Java library containing common functions used by both DFS and MapReduce)• Apache Top Level Project • Hadoop’s Website – hadoop.apache.org • Two Parallel Release Cycles – 1.x and 2.x © Orzota, Inc. 2013 18
  • 19. Introduction (contd.)• A Rich Eco-System built around Hadoop • Hive – Large Scale Data Warehouse • Hbase – NoSQL Database • Pig – A Data-flow language on top of Hadoop • Flume – Log Management for Hadoop • Oozie –Workflow framework • Mahout – Machine Learning Library on top of Hadoop • Vaidya – Performance benchmarking framework. • MRUnit – Unit testing framework for MapReduce Programs. • And many more … © Orzota, Inc. 2013 19
  • 20. MapReduceMapReduce in 2 minutes –Problem Statement – Sum of Double of set of Numbers. 1 3 4 5 6 8 9 11 17 21 1 2 3 6 4 8 5 10 The intermediate array after 6 12 8 16 Processing 9 18 11 22 17 34 © Orzota, Inc. 2013 20 21 42
  • 21. Introduction – contd.Mapping Phase Code f(x) being sent to the slave node for applying the logic on the data piece. In our• Splitting the input 9 case the data piece is an entry from the Array. 17 8• Sending The Master Node 6 slaves(datanodes) the This node contains the mapping code - f(x). 1 code of the function to be applied on individual entries• Apply the f(x) method 11 of Array Written in the on the data split 1 map() method in Hadoop. 4 21 3 Slave Nodes Mapping Phase © Orzota, Inc. 2013 21
  • 22. Introduction – contd.Spill Phase• Masternode directs the 18 34 16 Mappers to send the The Master processed f(x) output 12 Node. The Results of the data to intermediate Processed Data (from the slave 2 nodes is given to s location. specific node where reducer function runs) 22• Shuffle and Sorting 2 8 42 6 Slave Nodes Spill Phase :- Shuffle and Sort © Orzota, Inc. 2013 22
  • 23. Introduction – contd.Reduce Phase• MasterNode g(x)=162 (JobTracker) to invokes The Master the Reduce task once Node. The Results of the spilling is over. the Processed Data (from the slave nodes is given to s• Get location of the Spill specific node where reducer function runs) output from MasterNode (Namenode). Slave Nodes Reducer Phase © Orzota, Inc. 2013 23
  • 24. MapReduce ProgrammingSteps involved in writing a MapReduce program• Write the Mapper• Write the Reducer• Write the Driver Life’s Simple until you start customizing and work onData Cleansing © Orzota, Inc. 2013 24
  • 25. Hadoop – Bird’s Eye View DN TT DN TT … DN TTDN TT DN TT Job Name Tracker Node DN TT … DN TT … DN TT DFS Message Path MapReduce Processing Msg © Orzota, Inc. 2013 25
  • 26. NoSQL – Not Only SQL 26 © Orzota, Inc. 2013
  • 27. IntroductionNon-Relational Databases• Data Model not bound by a Schema.• No Predetermined Schema, Run-Time Columns• Sample Data • Twitter Streams • Web Forms • Sensor Networks © Orzota, Inc. 2013 27
  • 28. Schema-less SystemsEntry 1{“name”:“emp1”}Entry 2{“name”:“emp2”,“e_id”:“1”,“e_addr”:“Cupertino”}Entry 3{“name”:“emp3”,“e_id”:“3”}Entry 4{“name”:“emp4”,“e_id”:“6”, “dob”:“03-Sep-1964”} © Orzota, Inc. 2013 28
  • 29. Business Requirements• High Writes, Low Reads – Sensor Networks, Large Hadron Collider, Click Logging.• High Reads, Low Writes – Archival Storage.• Don’t have any fixed Schema. Open Question - Where Else? © Orzota, Inc. 2013 29
  • 30. NoSQL Types• Key-Value Pair • Riak, Voldemort, etc.• Document Oriented • CouchDB, MongoDB, etc.• BigTable Implementations • Cassandra, HyperTable, Hbase, etc.• Graph oriented • Neo4j, etc. © Orzota, Inc. 2013 30
  • 31. Introduction © Orzota, Inc. 2013 31 © Orzota, Inc. 2013Source: http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/
  • 32. Wake up - Conclusion Time• BigData on the Rise • Technology and the Domain• Smart Engineers needed, with BigData skills• Chance to develop niche areas of Expertise even before stepping into the Industry• 3rd Year Students – Select your final year projects very carefully, with the tools mentioned in this Seminar• 4th Year Students – Equip your self with the necessary skills for better industry opportunities. © Orzota, Inc. 2013
  • 33. Recommendations• I recommend aspiring professionals and young professionals read: • How to Solve it by Computer – RG Dromey • Code Complete 2 – Steve McConnell • Advanced Programming in the Unix Environment – Richard Stevens• Many Books on Hadoop, NoSQL Datastores, and Big Data in general. … and many more © Orzota, Inc. 2013 33
  • 34. Questions ? © Orzota, Inc. 2013 34
  • 35. Thank YouContact Us at – Linkedin.com/company/orzota-inc- Twitter.com/orzota © Orzota, Inc. 2013 35