The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media).Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage.More detailed structured dataNew unstructured dataDevice-generated dataBut big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as:Scale out storageMPP database architecturesHadoop and the Hadoop ecosystemIn-database analyticsIn-memory computingData virtualizationData visualization
Content and service providers as well as global organizations that need to distribute large content files are challenged with managing and ensuring performance of these distributed systems. Thus a new approach using a single storage pool in the cloud that provides policies for content placement, multi-tenancy and self service can be beneficial to their business.
We’ve found our early adopter customer use a common approach to their journey to big data. First, they built on an infrastructure foundation that consists of elastic and scalable storage as well as analytics that can access all types of data. Next, they focus on improving the analytics processLastly, they embed Big Data into their applications and enable actionable insight. We found that customers who’ve used this approach have been able to transform into a more predictive enterprise.
Are you Ready for Big Data? Dr. PutchongUthayopas Department of Computer Engineering, Faculty of Engineering, Kasetsart University. firstname.lastname@example.org
We are living in the world of Data Video Surveillance Social MediaMobile Sensors Gene Sequencing Smart Grids Geophysical Medical Imaging Exploration
Big Data“Big data is data that exceeds the processing capacity ofconventional database systems. The data is toobig, moves too fast, or doesn’t fit the strictures of yourdatabase architectures. To gain value from this data, youmust choose an alternative way to process it.” Reference: “What is big data? An introduction to the big data landscape.”, EddDumbill, http://radar.oreilly.com/2012/01/what-is-big- data.html
The Value of Big Data• Analytical use – Big data analytics can reveal insights hidden previously by data too costly to process. • peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. – Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data.• Enabling new products. – Facebookhas been able to craft a highly personalized user experience and create a new kind of advertising business
3 Characteristics of Big DataVolume • Volumes of data are larger than those conventional relational database infrastructures can cope with • Rate at which data flows in is much faster.Velocity • Mobile event and interaction by users. • Video, image , audio from users • the source data is diverse, and doesn’t fall into neatVariety relational structures eg. text from social networks, image data, a raw feed directly from a sensor source.
Big Data Challenge• Volume – How to process data so big that can not be move, or store.• Velocity – A lot of data coming very fast so it can not be stored such as Web usage log , Internet, mobile messages. Stream processing is needed to filter unused data or extract some knowledge real-time.• Variety – So many type of unstructured data format making conventional database useless.
How to deal with big data • Integration of – Storage – Processing – Analysis Algorithm – Visualization ProcessingMassive Data Stream Processing VisualizeStream processing Storage Processing Analysis
A New Approach For Distributed Big L.A. Data BOSTON LONDON L.A. BOSTON LONDON Storage Islands Single Storage Pool• Disparate Systems • Single System Across Locations• Manual Administration • Automated Policies• One Tenant, Many Systems • Many Tenants One System• IT Provisioned Storage • Self-Service Access
Hadoop• Hadoopis a platform for distributing computing problems across a number of servers. First developed and released as open source by Yahoo. – Implements the MapReduce approach pioneered by Google in compiling its search indexes. – Distributing a dataset among multiple servers and operating on the data: the “map” stage. The partial results are then recombined: the “reduce” stage.• Hadooputilizes its own distributed filesystem, HDFS, which makes data available to multiple computing nodes• Hadoopusage pattern involves three stages: – loading data into HDFS, – MapReduce operations, and – retrieving results from HDFS.
WHAT FACEBOOK KNOWS Cameron Marlow calls himself Facebooks "in- house sociologist." He and his team can analyzehttp://www.facebook.com/data essentially all the information the site gathers.
Study of Human Society• Facebook, in collaboration with the University of Milan, conducted experiment that involved – the entire social network as of May 2011 – more than 10 percent of the worlds population.• Analyzing the 69 billion friend connections among those 721 million people showed that – four intermediary friends are usually enough to introduce anyone to a random stranger.
The links of Love• Often young women specify that they are “in a relationship” with their “best friend forever”. – Roughly 20% of all relationships for the 15-and-under crowd are between girls. – This number dips to 15% for 18- year-olds and is just 7% for 25-year- olds.• Anonymous US users who were over 18 at the start of the relationship – the average of the shortest number of steps to get from any one U.S. user to any other individual is 16.7. – This is much higher than the 4.74 steps you’d need to go from any Facebook user to another through friendship, as opposed to Graph shown the relationship of anonymous US users who were over romantic, ties. 18 at the start of the relationship. http://www.facebook.com/notes/facebook-data-team/the-links-of- love/10150572088343859
Why?• Facebook can improve users experience – make useful predictions about users behavior – make better guesses about which ads you might be more or less open to at any given time• Right before Valentines Day this year a blog post from the Data Science Team listed the songs most popular with people who had recently signaled on Facebook that they had entered or left a relationship
How facebook handle Big Data?• Facebook built its data storage system using open- source software called Hadoop. – Hadoop spreading them across many machines inside a data center. – Use Hive, open-source that acts as a translation service, making it possible to query vast Hadoop data stores using relatively simple code.• Much of Facebooks data resides in one Hadoop store more than 100 petabytes (a million gigabytes) in size, says SameetAgarwal, a director of engineering at Facebook who works on data infrastructure, and the quantity is growing exponentially. "Over the last few years we have more than doubled in size every year,”
The Journey To Big Data1 All Data Faster Answers Elastic & Scalable 2 Data Science Collaboration Self-Service 3 Real Time Decisions New Applications Data Monetization Big Data Enabled Apps Agile Process & Tools AnalyticsEngines Analytic Engines Analytic Productivity Platform Cloud Infrastructure Big Data Infrastructure Agile Analytics Predictive Enterprise Technology Focus People & Productivity Focus Application Focus
Data Tsunami• Data flood is coming, no where to run now! – Data being generated anytime, anywhere, anyone – Data is moving in fast – Data is too big to move, too big to store• Better be prepare – Use this to enhance your business and offer better services to customer