Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataImage Source: alternative-energy-fuels.com Image Source: MicFarris.com Jean-Marc Desvaux – March 2012
Session Abstract :What is Big Data ? Where does it apply ?What are the technologies behind it ?Is it going to replace your RDBMS ? …
Big data, It’s all Silicon Valley is talking about. It’sthe new buzz word after ‘cloud.’“Everybody is speaking of it and many areconvinced it is the only way forward. As always,such dramatic statements are not only dangerousbut serve to put some people off the concept. “
Source: Tom Kyte’s Big Data Are you ready ? presentation
Big Data is data that exceeds the processingcapacity of conventional database systems.It’s too big, too fast or does not fit thestructures of database architectures.To gain value from this type of data you needan alternative way to process it.Why this is happening ?Data is growing faster than computers aregetting bigger.
A catch-all term.Includes Social Networks data, Web logs, MP3s,Web pages unstructured content, XML, GPStracking data, Vehicles Telemetry, financial marketdata and many more…Can be characterized by the 3 Vs :- Image Source: Tom Kyte’s Big Data Are you ready ? presentation
VolumeData growing faster than machines gettingbigger.Data sources adding up..VelocityRate of acquisition and desired rate ofconsumption.VarietyExtends beyond structured data, includesunstructured data of all varieties. Image Source: Tom Kyte’s Big Data Are you ready ? presentation
Big Data value to an Organisation falls into twomain categories : Analytical Use Enabling new products and services
Analytical UseTo reveal insights previously hidden becausehard to record and exploit.An edge on classic Analytics based onsampling and more “static” &predetermined reports.It promotes an investigative approach todata and put the data scientist and analystin the spotlight.Hal Varian, chief economist at Google“I keep saying that the sexy job in the next 10 yearswill be statisticians”
Some terms linked to the Analytical Use of Big Data Sentiment Analysis :Mining the Web in real time and getting a quick read of what people are thinking. Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big Brother or Amitabh Bachan)
Product/Service EnablerSome products and services cannot exist if notbacked up by Big Data technologies:-Need to Scale-Need a fast Feedback Loop on complexanalytics.Highly successful Web startups pioneering BigData technologies through R&D to enable newtype of products are a good example:Google, Yahoo, Amazon,Facebook.
Sectors with Fast Adoption and High Potential Financial Sector Telecommunications Government Health Retail
Big Data Sources :Internal &Data Marketplaces.
Internal sources Time Attendance logs RFID sensors logs Security Logs Vehicles GPS tracking Machinery/Telemetry Logs Pictures & videos Enterprise Social Networks Service Forum/Discussions ….Mostly anything unstructured or simply structured
An Enterprise Architecture for Big Data An analogy with a Sugar Cane Factory
SUGAR CANE FIELDS A Sugar FactoryAQUIRE (HARVEST)EXTRACT/SCHREDEVAPORATE/DISTILL/BOIL DRY/STORE/SUGAR BOTTOM LINE = VALUE
DATA SOURCES (RDBMS & An Enterprise Big Data Factory Data Marketplaces) AQUIRE (HARVEST) HDFS NoSQL Database RDBMS (Hadoop Distributed FS) (Hadoop Distributed FS) Enterprise ApplicationsORGANIZE(EXTRACT) Map Reduce Big Data RDBMS (Hadoop) Connectors Connectors ANALYSE Data Warehousing / RDBMS stores(SCHRED/DISTILL/BOIL) BUSINESS Analytic Applications INTELLIGENCE the sweet part (sugar/rhum) (DECIDE) BOTTOM LINE = VALUE
Greenplum (EMC2)An Example of a Turnkey Factory Solution
Another “Turnkey Factory” Example from Oracle Targeting high-end AnalyticsAQUIRE (HARVEST) ORGANIZE(EXTRACT) BUSINESS ANALYSE INTELLIGENCEORGANIZE(EXTRACT) (SCHRED/DISTILL/BOIL) (DECIDE) Image Source: Tom Kyte’s Big Data Are you ready ? presentation
The Microsoft way+ Of Course, you can build your own factory using OpenSource widely available and on which most turnkey factory are built.
Turning RDBMS to a legacy data store ?Not at all.We need RDBMS to store high value data and for itsfeature rich approach (feature first).NoSQL (scale first) is not a superset of RDBMStechnologies (a bit like Einstein Relativity to NewtonPhysics).Remember NoSQL is not “No SQL” but “Not Only SQL”
Rise of Data MarketplacesData Science tools development:More powerful & expressive toolsets for analysisStreaming Data processing emerging tools(Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BIFurther cloud-enablementEase of integration to Enterprise Sources
To leverage Big Data you need something like a SugarFactory.It can be very entry level factory (Excel – Azure Source)or more complex.The more complex and complete the more value at theend of the processing chainTo turn Big Data technologies from developer-centricsolutions to enterprise solutions, they must becombined with SQL solutions into a single proveninfrastructure meeting manageability and securityrequirements of enterprises.
The challenge for Enterprises is to simplify Big Dataintegration/engineering and leverage it where possibleto improve their processes at tactical and strategiclevels.Architects & DBAs will be able to make choices fordatastores technologies and will need to understandwhere one is better than the other.Big Data has to be part of the Enterprise ApplicationsEcoSystem where it will be turned to value.