• Save
Big data ppt
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Big data ppt

on

  • 26,305 views

 

Statistics

Views

Total Views
26,305
Views on SlideShare
25,858
Embed Views
447

Actions

Likes
41
Downloads
0
Comments
9

33 Embeds 447

http://thirups.blogspot.in 174
http://thirups.blogspot.com 58
http://aarif-s-ahmed.blogspot.ru 46
http://thirups.blogspot.kr 22
http://www.wikispaces.com 19
http://aarif-s-ahmed.blogspot.com 18
http://pangaan-online.blogspot.in 15
http://thirups.blogspot.ru 12
http://thirups.blogspot.tw 10
http://networkingwithsafi.blogspot.in 8
http://thirups.blogspot.hk 8
http://thirups.blogspot.be 7
http://thirups.blogspot.de 6
http://localhost 5
http://huhry.dyndns-web.com 5
http://thirups.blogspot.com.es 5
https://www.linkedin.com 3
http://thirups.blogspot.nl 3
http://aarif-s-ahmed.blogspot.in 3
http://10.224.64.53 3
http://networkingwithsafi.blogspot.com 2
http://www.linkedin.com 2
http://pangaan-online.blogspot.com 2
http://10.224.64.65 2
https://solveforall.com 1
http://thirups.blogspot.co.uk 1
http://thirups.blogspot.ie 1
http://thirups.blogspot.com.br 1
http://aarif-s-ahmed.blogspot.be 1
http://thirups.blogspot.com.au 1
http://10.224.64.63 1
http://thirups.blogspot.jp 1
http://thirups.blogspot.sg 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

15 of 9 Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • the ppt is very good ...how do u download it but ???
    Are you sure you want to
    Your message goes here
    Processing…
  • please sir send this ppt to raji.darika@gmail.com
    Are you sure you want to
    Your message goes here
    Processing…
  • Boss, please Mail to me this presentation at Rajavelu007@gmail.com. please Ji.,,,
    Are you sure you want to
    Your message goes here
    Processing…
  • sir, please allow me to download this presentation or mail me at biplabsau.sau@gmail.com. please sir
    Are you sure you want to
    Your message goes here
    Processing…
  • dam cool
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • ,

Big data ppt Presentation Transcript

  • 1. Introduction toBIG DATA Thiru
  • 2. What is BIGDATA? http://www.forbes.com/sites/oreillymedia/2012/01/19/volume-velocity- variety-what-you-need-to-know-about-big-data
  • 3.  Search EngineData Scalability  10KB / doc * 20B docs = 200TBProblems  Reindex every 30 days: 200TB/30days = 6 TB/day  Log Processing / Data Warehousing  0.5KB/events * 3B pageview events/day = 1.5TB/day  100M users * 5 events * 100 feed/event * 0.1KB/feed = 5TB/day  Multipliers: 3 copies of data, 3-10 passes of raw data  Processing Speed (Single Machine)  2-20MB/second * 100K seconds/day = 0.2-2 TB/day
  • 4. What’s the social sentiment How do I better predict futurefor my brand or products outcomes? How do I optimize my fleet based on weather and traffic patterns?
  • 5. Traditional E-Commerce Data Flow
  • 6. New E-Commerce Big Data Flow
  • 7. Introduction to Hadoop
  • 8. Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which mayHADOOP be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.
  • 9. HadoopHistory Jan 2006 – Doug Cutting joins Yahoo Feb 2006 – Hadoop splits out of Nutch and Yahoo starts using it. Dec 2006 –Yahoo creating 100-node Webmap with Hadoop Apr 2007 –Yahoo on 1000-node cluster Jan 2008 – Hadoop made a top-level Apache project Dec 2007 –Yahoo creating 1000-node Webmap with Hadoop Sep 2008 – Hive added to Hadoop as a contrib project
  • 10. • Commodity hardwareBIG DATA compatibility ECONOMICS • Reduction in storage costEconomics • Open source system • The Web economy
  • 11. Column Store Database
  • 12. Row Store andColumn Store
  • 13.  Can be significantly faster than row stores for some applications  Fetch only required columns for a query  Better cache effects  Better compression (similar attribute values within a column)Why Column  But can be slower for other applications  OLTP with many row inserts, ..Store?  Long war between the column store and row store camps :-)
  • 14. So How Does It Work?
  • 15. So How Does It Work?
  • 16. The Hadoop Ecosystem
  • 17. Traditional RDBMS vs. MapReduceComparisons
  • 18. HDFS, the storage layer of Hadoop, is a distributed, scalable, Java-based file system adept at storing large volumes of unstructured data. MapReduce is a software framework that serves as the compute layer of Hadoop. MapReduce jobs are divided into two (obviously named) parts. The “Map” function divides a query into multiple parts and processes data at the node level. The “Reduce” function aggregates the results of the “Map” functionHadoop to determine the “answer” to the query.Ecosystem Hive is a Hadoop-based data warehouse developed by Facebook. It allows users to write queries in SQL, which are then converted to MapReduce. This allows SQL programmers with no MapReduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools such as Microstrategy, Tableau, Revolutions Analytics, etc. Pig Latin is a Hadoop-based language developed by Yahoo. It is relatively easy to learn and is adept at very deep, very long data pipelines (a limitation of SQL.)
  • 19. HBase is a non-relational database that allows for low-latency, quick lookups in Hadoop. It adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes. EBay and Facebook use HBase heavily. . Flume is a framework for populating Hadoop with data. Agents are populated throughout ones IT infrastructure – inside web servers, application servers and mobile devices, for example – to collect data and integrate it into Hadoop.HadoopEcosystem Oozie is a workflow processing system that lets users define a series of jobs written in multiple languages – such as Map Reduce, Pig and Hive -- then intelligently link them to one another. Oozie allows users to specify, for example, that a particular query is only to be initiated after specified previous jobs on which it relies for data are completed Whirr is a set of libraries that allows users to easily spin-up Hadoop clusters on top of Amazon EC2, Rackspace or any virtual infrastructure. It supports all major virtualized infrastructure vendors on the market.
  • 20. Avro is a data serialization system that allows for encoding the schema of Hadoop files. It is adept at parsing data and performing removed procedure calls Mahout is a data mining library. It takes the most popular data mining algorithms for performing clustering, regression testing and statistical modeling and implements them using the Map Reduce modelHadoopEcosystem Sqoop is a connectivity tool for moving data from non-Hadoop data stores – such as relational databases and data warehouses – into Hadoop. It allows users to specify the target location inside of Hadoop and instruct Sqoop to move data from Oracle, Teradata or other relational databases to the target. BigTop is an effort to create a more formal process or framework for packaging and interoperability testing of Hadoops sub-projects and related components with the goal improving the Hadoop platform as a whole.
  • 21. Microsoft & Hadoop
  • 22. Insights to allusers byactivating newtypes of data
  • 23. Microsoft BI
  • 24. Stats Machine Legend Graph Pipeline / Workflow processing Learning (Pegasus) Red = Core Hadoop (RHadoop) (Mahout) Blue = Data (Oozie) Metadata processing (HCatalog) Purple = Microsoft ( ODBC / SQOOP/ REST) integration points Scripting Query Data Integration NoSQL Database and value adds (Pig) (Hive) Yellow = DataMicrosoft (HBase) Movement Distributed Processing Event PipelineHadoop Stack (MapReduce) Green = Packages (Flume) White = Coming Soon Distributed Storage (HDFS) Monitoring & Active Directory Deployment (Security) (System Center)
  • 25. Others
  • 26. HadoopCommercialDistributors
  • 27. Other Big DataWorlds
  • 28. Other Big DataWorlds
  • 29. Big DataIntegrations,Visualizations& Analytics
  • 30. Thank You