Your SlideShare is downloading. ×

Big data by_mcal

698
views

Published on

Big Data and Hadoop Training batch in Pune is scheduled to commence on December 7th, 2013.This batch will be as per a new revamped four day schedule, contents and focus, based on feedback from …

Big Data and Hadoop Training batch in Pune is scheduled to commence on December 7th, 2013.This batch will be as per a new revamped four day schedule, contents and focus, based on feedback from participants of earlier courses. The training is conducted in a workshop like environment with an effective blend of hands-on practicals and assignments to augment the fundamental theory covered.

About the Faculty:
He is a Doctorate in Engineering and an industry veteran with more than twenty five years experience in launching new technologies, products and businesses. He has been involved in acquiring five patents for the company that he has worked for.

Big Data Analytics – Why?
Data is now generated by more sources and at ever increasing rates. Examples include Social Media sites, GPS based tracking systems, point of sale equipment, etc. The ability to process such data can provide that essential edge required for business success. Demand for Big Data professionals is rapidly increasing. Knowledge of Big Data can provide an advantage leading to faster professional advancement

About this course
This course on Big Data Analytics for Business is a combination of essential fundamentals, practical techniques, hands-on sessions on Hadoop, and case studies to cement all this together.

By completing this course you will be able to …
 Understand fundamentals of analytics: Descriptive, Predictive and Prescriptive Analytics
 Know what ‘Big Data’, Map Reduce and Hadoop are all about
 Get a grip on the structure of Big Data applications
 Effectively use Big Data techniques like Map Reduce and tools like Hadoop, Hive, Hbase, Pig
 Choose the most appropriate tools to solve Big Data problems
 Identify, propose and lead Big Data projects in your organizations

Course Content -
 What is Big Data?
 Overview of Big Data tools and techniques
 In-depth coverage of Map-reduce techniques to manage Big Data
 Hadoop - In Depth
 HDFS – In Depth
 Installing and managing Hadoop – Hands-on
 Introduction to Hadoop Clusters
 Hands-on session using native installation and Amazon EMR implementation of Hadoop
 The Hadoop ecosystem: Pig, HIVE, HBase, Pig, SQOOP and Flume
 Analytics: Descriptive, Predictive and Prescriptive
 What is Big Data Analytics
 Introducing Analytics in the enterprise: Case Studies
 Trends in Big Data Analytics

The course takes a "hands-on" approach to ensure that the basics are understood very well and assimilated concepts are applied in practice.

Essential pre-requisite for practitioner course: Java programming language.
Note: Basic Java Module for participants those who are new to Java.

Published in: Education, Technology, Business

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
698
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Unraveling Big Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.
  • 2. Our Goal for Today 1. Evolution of digital data over the decades 2. Why do we process data – and how? 3. How all this has been changing in the last decade? 4. What is Big Data and how to handle it? 5. Who needs to understand Big Data? 6. What are the Big Data related opportunities? 7. Discussions and Q&A Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2
  • 3. Setting The Context Managerial Leadership and Team 3
  • 4. Bits, Bytes, and Beyond Name Value Example Bit A BIT !! Byte 8 Bits 1 Character Kilobyte 1024 (1K) Bytes About 150 words Megabyte 1K Kilobytes A small book Gigabyte 1K Megabytes 20 GB = All of Beethoven’s work Terabyte 1K Gigabytes 1000 copies of Encyclopedia Britannica Petabyte 1K Terabytes 500 billion pages of standard printed text Exabyte 1K Petabytes 5 EB = All words ever spoken by mankind Zettabyte 1K Exabyte 1 ZB = Entire planet’s digital content Yottabyte 1K Zettabye 1 YB = will take 11 Trillion years to download! Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 4
  • 5. History of Data Storage Capacity 1956 Hard Drive from IBM : 5 MB 1963 Audio Tape : 663 KB 1970 Floppy Disk : 80 KB 1976 Floppy Disk : 110 KB 1981 Floppy Disk : 1.4 MB 1982 CD : 700 MB 1995 DVD : 4.7 GB 2003 BLU RAY : 25 GB Hard Disks : Multi Terabyte WWW & CLOUD Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 5
  • 6. Cost Per Gigabyte YEAR COST / GB 1980 $ 3,000,000 1990 $ 8,000 2000 $ 30 2010 $ 0.08 Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 6
  • 7. Prior to the 80’s  E-commerce did not exist.  Data entry, storage, and processing were sequential processes – and displaced in time.  Data was processed on monolithic computers running on mainframes.  Batch processing was the norm.  Data processing was used in non-time-critical areas such as payroll and accounting.  Only large enterprises and institutions could afford data processing.  Data processing could only support long term analysis and decision making processes – such as planning. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 7
  • 8. Prior to the 80’s… Data was largely STRUCTURED Managerial Leadership and Team 8
  • 9. Structured Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 9
  • 10. Structured Data Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 10
  • 11. Data Processing in the 80’s and Before Data creation was a controlled process. Rate of data creation was known and manageable. Data creation and processing : Co-located. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 11
  • 12. Database Systems of the 80’s and Prior Navigational Relational Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 12
  • 13. In the 90’s  Better connectivity allowed data to be collected from distributed, but finite sources.  Data created was directly captured and stored online.  Online Transaction Processing (OLTP) systems emerged.  Data processing could now support operational decision making since data capture and processing could be done real time. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 13
  • 14. In the 90’s Cont...  Data creation was still a controlled step and data was structured.  Volumes of data generated was manageable.  Data processing was still centralized.  Relational Databases ruled the world of data processing. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 14
  • 15. Then… “INTERNET HAPPENED” Changing the way we live in this world … Managerial Leadership and Team 1 5
  • 16. Internet Traffic Trends Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 16
  • 17. Early Years of Internet Internet enabled e-commerce B2B Transactions B2C Transactions     Banking and Finance Travel and Hospitality Retail Health Care Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 17
  • 18. Early Years of Internet Cont...  Volume of online transactions rapidly increased.  Database systems had to separate online processing from analysis to cope with the transaction volume.  Data Warehousing emerged.  Distributed databases also made their appearance. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 18
  • 19. Early Years of Internet Cont...  In the early days, the processed data was still structured since it dealt with e-commerce transactions.  The need was for systems that focused on transactions: validation and recording.  Consequently, transaction and analysis systems had to be separated. ETL (Extract Transform Load) processes managed data conversion from one form to another (transaction  analysis). Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 19
  • 20. In the New Millennium Rapid adoption of Internet. Explosion of e-commerce : Especially B2C. The Internet enabled customers to seek out the best deal. Businesses had to proactively entice customers. • To consume their products and services. • At the point of purchase. Data processing moved from playing a supportive role to a “Business Critical” role. • Nature of certain businesses completely changed. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 0
  • 21. Then Came SOCIAL NETWORKING and MOBILITY Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 21
  • 22. Impact of Social Networking Success of B2C business transactions now depends on the ability to analyze customers’ past and current behaviour real-time! Social Networking has become a source of valuable information to understand customer choice and behaviour. Social Networking = Unstructured Data Social Networking = Extremely large data generation rates Social Networking = Highly distributed Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 2
  • 23. Unstructured and Distributed Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 3
  • 24. Unstructured Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 4
  • 25. Unstructured Data Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 25
  • 26. Unstructured Data Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 6
  • 27. Very High Data Creation Rates Year Data Estimate 2002 5 Billon GB 2006 161 Billion GB 2010 1277 Billion GB 2015 7910 Billion GB Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 27
  • 28. The Situation Today… Every two days now we create as much information as we did from the dawn of civilization up until 2003. - Erik Schmidt, GOOGLE Structured Data constitutes only 5% of the total “Data Deluge”. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 28
  • 29. Business Processes – Then and Now Then Now Anticipate product / service need Anticipate product / service need Marketing Marketing Sales Sales Transaction Transaction Analysis Analysis Refinement Refinement Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 29
  • 30. Who Needs Rapid Data Analysis Banking and Finance Credit / Debit / ATM card transactions • Collaboration between banks • Fraud detection • Real-time analysis of CCTV to detect and prevent ATM attacks Credit / Loan approval • Credit analysis based on credit history as well as social network traces Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 30
  • 31. B2C ecommerce Sites (Online Stores) Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 31
  • 32. B2C – Product Comparison Sites Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 32
  • 33. Data Analysis in Elections The last USA elections Data-driven decision making played a huge role in creating a second term for the 44th President and will be one of the more closely studied elements of the 2012 cycle. Time: Nov 10, 2012 Obama Election Head Office - Chicago Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 33
  • 34. Crime Investigation / Prevention / Surveillance Processing of email / chat / phone call traces • Accessed by Govt. agencies Processing of Facebook / Twitter posts / Chats • Sentiment analysis for crime prevention Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 34
  • 35. Common to All These Situations… • UNSTRUCTURED data. • Very large data sets – dynamic and rapidly increasing by the minute. o Terabytes of Data (BIG DATA) • Highly dispersed and distributed data generation. • Impossible to move such data to a central location for processing. • At the same time, very critical to process data and generate results real-time. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 35
  • 36. Characteristics of New Age Data Processing Systems  Ability to handle unstructured data.  Ability to handle rapidly increasing volumes of data.  Ability to operate on distributed data sets.  Scalable.  Reliable/Fault tolerant.  Reasonable costs - one time & operational. These requirements have led to increasing interest in BIG DATA the development of newer Data Storage & Analysis Techniques. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 36
  • 37. Growing Interest in Big Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 37
  • 38. Conventional Database Systems Relational Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 38
  • 39. Conventional Database Systems Cont… Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 39
  • 40. Data Models and Database Systems Over the Years Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 40
  • 41. History of Data Models and Database Systems MAP REDUCE, COLUMNAR DATABASES & NO-SQL DATABASES Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 41
  • 42. How to Tackle Big Data – In Simple Words 1. Break down the problem into manageable chunks. 2. Spread the data and its processing it over a number of nodes – typically cheap computers. 3. Manage the process to ensure that nothing gets lost. 4. Re-assemble the answer from the various parts to get your query answered. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 42
  • 43. Map – Reduce : Technique to Handle BIG DATA Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 43
  • 44. Map – Reduce : Technique to Handle BIG DATA Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 44
  • 45. The Map – Reduce Technique Advantages Drawbacks Can handle both, structured and unstructured data. Not very easy to setup and use. Can scale up with data size. Raw Map - Reduce requires programming to set up. Open source implements available: Reasonable costs. Basic Map - Reduce suitable largely for batch processing. • (Real time techniques have now been implemented to overcome this drawback). Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 45
  • 46. Hadoop Based on the Map-Reduce distributed processing architecture. A task is mapped to a set of servers for processing. Results from the servers are then reduced down to a singe set. Hadoop operates on the HDFS distributed file system. - HDFS ensures data redundancy. Hadoop has in-built task management functionality to ensure reliability. Interfaces available with other components: Open Systems and commercial. Highly scalable and cost effective. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 46
  • 47. HDFS HDFS Hadoop Distributed File System Goals (Ref: Nortonworks) • Store Petabytes of data. • Keep per node costs down to afford more nodes (scalability). • Commodity x86 servers, Open Source software. • Support computation in each server. • Handle failures: Failures treated like noise – inevitable. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 47
  • 48. HDFS Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 48
  • 49. Big Data Analysis – The Big Picture! Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 49
  • 50. Components Relevant to Hadoop Hbase Database to store data and speed up queries. Hive Warehouse implementation to support Analytics, Query and Visualization. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 50
  • 51. HBase HBase is a Columnar, NoSQL database system. HBASE RDBMS Column oriented Row oriented Flexible schema, add columns on the fly Fixed Schema Good with sparse tables (partially Not optimized for sparse tables filled) No query language SQL Wide tables Narrow tables Joins using Map – Reduce Optimized for joins Tight integration with Map Reduce Not integrated (usually) with MR Horizontal scalability – just add hardware Hard to scale and size down Good for semi-structured & structured Good only for structured data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 51
  • 52. Hive • Hadoop can get difficult to configure and use! • Hive sits between Hadoop and the users of Hadoop. • It provides a familiar – TABLE like – environment for dealing with Hadoop. • It allows Data to be: o Read from Hadoop / HDFS o Written into Hadoop / HDFS o Queried from Hadoop / HDFS using the much familiar SQL like syntax • In the background, Hive efficiently converts all queries into efficient MAP – REDUCE tasks. • Hive is a Data Warehouse system for Hadoop. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 52
  • 53. HBase v/s Hive HBase Hive Typically used for unstructured data and sparse tables. Typically used as a Data Warehouse. Allows low latency random data access. Main purpose is analysis and adhoc querying. Main purpose is continuous operations such as accepting data feeds and committing them to HDFS. Deals with Structured Data resulting from analysis of data stored in HDFS. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 53
  • 54. Pioneers of Big Data eBay In excess of 2500 computing cores Yahoo In excess of 4000 nodes Facebook More than 23,000 nodes Google ?? (24 Pb of data/day) LinkedIn ?? Source: Slide by Ian Brown Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 54
  • 55. Big Data Solution Suppliers Informatica EMC Oracle IBM Microsoft Teradata Amazon Cloudera Apache Google Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 55
  • 56. Who Uses Big Data (2011) Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 56
  • 57. Case Study : redBus.in • redBus.in : Internet based bus ticket booking • Handles more than 10,000 routes • Goal o To capture each and every event happening on their website & co-relate them o To identify if booking failures were due to absence of supply, or due to server problems o To understand which routes needed more buses • Volume of data: 500 GB • Expected response time: Less than 1 minute • Tool / service used : BigQuery from Google Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 57
  • 58. Case Study : Seagate • Seagate : Has manufactured more than 2 Billion hard drives • They maintain data comprising: o Information related to the 2 Billion hard drives o Manufacturing information o Supplier information o Customer information • 400 GB of data added per day to the Warehouse • Used Big Data techniques to analyze Test Data • Impact : Overall improvement in quality due to sharp identification of process and supplier issues • Tools used: Not known Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 58
  • 59. Case Study : Macy’s • They want to prevent an overload of irrelevant promotions going to their customers. • They are sending fewer, more focused messages to individual clients about products and special offerings that have a high likelihood of being appealing to that person. • They are combining point-of-sale information with o online browsing behaviours o response to emails o social media activity o and more … • To get a 360-degree view of each customer. • The result: fewer, more meaningful interactions with customers that drive greater loyalty, greater revenues, and lower churn. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 59
  • 60. Other Applications of Big Data  Epidemic prediction  Weather predictions  Scientific experiments generating very large amount of data such as the Super Collider.  Astronomy  Search for extra terrestrial intelligence Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 60
  • 61. Big Data Challenges Hadoop and Big Data Technologies are time consuming to set-up and use. Building and running Hadoop jobs is non-trivial. Running and analyzing queries and results does not leverage existing skills. Requires special teams to initiate in an organization – along with associated costs. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 61
  • 62. Who Should Know About Big Data Decision Makers To understand its capabilities and how to use it for Business gains. Data Scientists To be able to understand and apply the right techniques to solve Big Data problems. Big Data Applications Developers To know the building blocks, and nuts and bolts of putting together a Big Data processing system. Big Data Analysts IT Stafff Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 62
  • 63. Big Data Macro Trends • Information generation growing 2 times faster than storage capacity. • Growth in data collection: 60% CAGR. • Information Management industry: o Sized at $100 Billion o Growing at 10% CAGR • Big Data sources are becoming more varied. o Mobile phones, sensors, etc. • Total Internet traffic will exceed 667 Exabytes by 2013. • Third party data availability is on the rise. • Hadoop is the fastest growing Big Data : Downloads have increased more than 400% in the last two years. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 63
  • 64. Big Data Market Size Projection Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 64
  • 65. Future of Big Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 65
  • 66. Career Opportunities Internet products and services companies Manufacturing companies Banking and finance Pharma Govt Departments Direct Opportunities • • • • • Indirect Opportunities • Handling outsourced Big Data analysis and development projects for the above organizations. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 66
  • 67. Structured v/s Unstructured Data Unstructured Data 1. Structured Data Web server and search engine logs (“data exhaust”) Customer databases Logs from other types of servers 2. (e.g., telecom switches and gateways) 3. E-Commerce / Web Commerce records Legacy BI/ CRM/ ERP systems Inventory and Supply Chain 4. Social Media / Gaming messages 5. Multimedia – voice, video, images 6. Sensor data / M2M communications Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 67
  • 68. Structured v/s Unstructured Structured Unstructured Discrete (rows and columns) Binary large objects: Lessdefined boundaries, less-easily addressable. Small discrete objects: Information represented for a very specific purpose (e.g., SMTP Mail Msg.). Storage/Persistence DBMS or file formats (e.g., VSAM). Unmanaged, file structure or content repository. Metadata Focus Syntax (e.g., location and format). Semantics (descriptive and other markup). Integration Tools ETL or ELT, Enterprise Information Integration via BizTalk and Batch Processing. Batch Processing, Manual data Entry, Custom solutions that involve a lot of code. Standards SQL (and its multiple Open XML, SMTP, SMS, CSV and variations), ADO.Net, ODBC Information and Content Exchange. and many RDBMS support XML as another option. Representation Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 68
  • 69. Evolution of Data Transfer Rates Medium Transfer Rate Modems 56 Kilobits / Second T-1 Line 1.544 Megabits / Second Ethernet 10 Megabits / Second Fast Ethernet (LAN) 100 Megabits / Second 1 Gigabits / Second T-3 44.736 Megabits / Second Optical Fibres Upto 20 Gigabits / Second (Dedicated) Next Internet Backbone 2.4 Gigabits / Second Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 69
  • 70. History of Analytics Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 70
  • 71. Prior to the 80’s E-commerce did not exist Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 71
  • 72. Prior to the 80’s Data entry, storage, and processing were sequential and displaced in time Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 72
  • 73. Prior to the 80’s Data was processed by monolithic applications running on mainframes Batch processing was the norm Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 73
  • 74. Prior to the 80’s Data processing was used in non-time-critical areas such as payroll, accounting Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 74
  • 75. Prior to the 80’s Only large enterprises and institutions could afford the cost of processing data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 75
  • 76. Prior to the 80’s Data processing could only support long term analysis and decision making processes – such as planning Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 76
  • 77. Hadoop – Value Adding Projects/Products Hadoop 1. HBase 2. Cassandra 3. Mongo 4. CouchDB Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 77
  • 78. Projects/Products Adding Value to Hadoop The standard Hadoop database, an open-source, distributed, versioned, column-oriented store, providing Bigtable-like capabilities over Hadoop. HBase Cassandra HBase includes base classes for backing Hadoop MapReduce jobs; query predicate push; optimizations for real time queries; a Thrift gateway and a REST-ful web service to support XML, Protobuf, and binary data encoding: an extensible JRu-by-based (JIRB) shell; and support for the Hadoop metrics subsystem. Like Hadoop, HBase is an Apache project, hosted at http://hbase.apache.org/ Apache Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable's ColumnFamily-based data mode. The Cassandra project lives at http://cassandra.apache.org/ A good example of using Cassandra together with Hadoop lies in the Datastax Brisk platform - learn more at http://www.datastax.com/ Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 78
  • 79. Projects/Products adding value to Hadoop Cont… Mongo An open source, scalable, high-performance, schema-free, documentoriented database written in C++. The MongoDB project is hosted at http://www.mongodb.org/. To use Mongo and Hadoop together, check out https://github.com/mongodb/mongo-hadoop Apache CouchDB is a document-oriented database supporting queries and indexing in a MapReduce fashion using JavaScript. CouchDB CouchDB provides APls that can be accessed via HTTP requests to support web applications. Learn more at http://couchdb.apache.org/ Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 79
  • 80. Big Data Applications: Additional Ideas Balance Sheet Analysis Manufacturing Data Analysis Production Systems Diagnostics and Pattern Identification Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 80
  • 81. Thank you for your attention! Please ask questions, if any! Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 8 1