The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media).Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage.More detailed structured dataNew unstructured dataDevice-generated dataBut big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as:Scale out storageMPP database architecturesHadoop and the Hadoop ecosystemIn-database analyticsIn-memory computingData virtualizationData visualization
Cloud Computing: A New Trend in IT Dr.PutchongUthayopas Department of Computer Engineering Faculty of Engineering Kasetsart University email@example.com
New Demand for IT infrastructure Capacity Massive processing power Massive Storage Security Availability Scalability Start small and grow on demand Cost effective
New ChallengesHigh operating cost Man power cost Equipment cost Energy costHigh operating complexity Changing technology Increase complexity Network, server, storage , security
Dream machineComputer with infinite capacityStart small and grow big basedon my demandCapacity can scale up and downon demandPay only what we use.No complex operating andmaintenance
What is Clouding Computing? • A style of computing in Cloud which dynamically scalable and often virtualized computing resources are provided as a service over the Internet. Google Saleforce AmazonSource: Wikipedia (cloud computing) Microsoft Yahoo
Power Grid Inspiration for Computing?: Deliver ICT services as “computing utilities” to users
Economic of Cloud UsageSource: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab,UC Berkeley
Why we should move to the cloud?Quick start up no need to purchase any equipment. Subscribe, pay, and use it.Scalability less demand less computing power, more demand more computing powerElasticity Handle the demand surgeLess maintenance No need to hire people to fix server broken, hacking, tuningLess operation cost Pay only what you really use Cut the cost of maintaining huge infrastructureIt is cool, trendy Just a stupid execute when people do not believe you ^_^
Cloud Computing Definition (NIST) This cloud model is composed of five essential characteristic three service models four deployment models.
5 Characteristics of Cloud System Broad On-demand Resource network self-service pooling access Rapid Measured elasticity Service
Three Cloud Service Models Software as a Service • End user software • gmail. Googledoc, facebook Platform as a Service • Programming platform • Azure, google app engine Infrastructure as a Service • Computer server • Vmware, EC2 , Openstack
Cloud Deployment Model Private Cloud • Internal cloud used by an organization Community Cloud • Internal Cloud Shared by multiple organizations Public Cloud • Providers Cloud shared by many users Hybrid Cloud • Cloud that composed of two or more cloud
Using IaaS CloudUser view the cloud as a number of serversLook the same as co-location server This is actually a virtual server Windows or many flavor of LinuxUser can start stop and reboot from web interfaceNormal web based application work fineUsage is charge on pay per useCan try at aws.amazon.com Open a new account and start a new server use less than 30 minutes to apply
Using PaaS CloudPaaS cloud give you an API to program on the cloudThere is a need to port application etc. .NET to Windows Azure Python to google app enginePros and Cons More light weight that IaaS but need some application porting effort
Using SaaS CloudYou have already used it! Facebook Gmail Calendar Google MapRunning application directly from you browserNo coding , no porting just pay and use or use it forfree
What the Cloud can do?server consolidation Iaas cloud is the same as allow you to use many servers hosted by service providersScalable web application Community web like sanook , kapook Web app for anything you want to doBack end for mobile app iCloud, GoogleCloud are being used
The Cloud and I Money Video Data Music books Computing Power Personal information Services Picture Application Games ACCESSANYTIME STORAGEANYWHERE Internet SHARINGANYHOW RELIABILITY SECURITY AVAILABILITY
The Cloud and IGoogle docs (Office) Spread Sheet Word processor Presentation Calendar Gmail
The Cloud and Icalendar picture music My cloud (google, facebook, dropbox, amazon)document
Work Life with a CloudAppointment (google calendar) My secretary take appointment , add to calendar I got to see it on every device quickly, so is she Device notify meEmail (gmail) I can go to any computer/device with browser, my email follow me there. I have no need to install mail client, maintain mail server
Work Life with a CloudDocument (google docs) I can create basic document, good spreadsheet, basic presentation without installing any software I can down load document and edit it on my computer I can share my document with other on internet and edit it togetherStorage (google drive, dropbox) Create presentation on notebook, drop in in dropbox Present from iPad, Smartphone Secure, no need to carry thumb drive Easily share file with other people making team work easy
Play Life with a CloudPicture Using Instagram, photo, video I take instantly appear on twitter and facebook and neatly catalog Picture can be shared, tag, comment among my 2000 friends on facebook! If I want, they will know where I was. (Little dangerous)Communication My thought can be spread anytime anyway using facebook, googleplus, multiply I can even “hang out” with friend on google plus
Play Life with a CloudBook Amazon Kindle Store. Buy book from amazon and they will keep it on their cloud Unlimited book shelves, no cleaning, dusting Read your book on any device iPad, iPhone, Androiod Phone, Tablet, PC, Mac I read mine on iPad, and my Galaxy S2 phone
Play Life with a CloudMusic iTune Store allow you to shopping for music, movies You can load it and play on many of your devices Media Industry is changing, now you can own a radio station and TV station and get audiences around the world Power shift from infrastructure provider (TV station) to content creator ( like grammy etc.)
Some Existing Cloud Computing Systems Amazon AWS Google App Engine Microsoft Azure Openstack
Google App EngineGoogle App Engine is a platform fordeveloping and hosting web applicationsin Google-managed data centers first released as a beta version in April 2008.Google App virtualizes applications acrossmultiple servers and data centers.Google App Engine is free up to a certainlevel of used resources. Fees are chargedfor additional storage, bandwidth, or CPUcycles required by the application.[
App Engine Architecture req/respstateless APIs R/O FS urlfech Python stdlib VM mail process app images stateful datastore APIs memcache 31
Cloud Application Development UI Tier Web2.0 Processing Data Tier Management TierSeparate processing logic , UI, and DM TierUsing Services Oriented Architecture (SOA) design
OpenStack ArchitectureOpenStack is a cloud operating system that controls large pools of compute,storage, and networking resources throughout a datacenter, all managedthrough a dashboard that gives administrators control while empowering theirusers to provision resources through a web interface.
We are living in the world of Data Video Surveillance Social MediaMobile Sensors Gene Sequencing Smart Grids Geophysical Medical Imaging Exploration
Big Data“Big data is data that exceeds the processing capacity ofconventional database systems. The data is too big,moves too fast, or doesn’t fit the strictures of yourdatabase architectures. To gain value from this data, youmust choose an alternative way to process it.” Reference: “What is big data? An introduction to the big data landscape.”, EddDumbill, http://radar.oreilly.com/2012/01/what-is-big- data.html
The Value of Big DataAnalytical use Big data analytics can reveal insights hidden previously by data too costly to process. peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data.Enabling new products. Facebookhas been able to craft a highly personalized user experience and create a new kind of advertising business
3 Characteristics of Big DataVolume •Volumes of data are larger than those conventional relational database infrastructures can cope with •Rate at which data flows in is much faster.Velocity •Mobile event and interaction by users. •Video, image , audio from users •the source data is diverse, and doesn’t fall intoVariety neat relational structures eg. text from social networks, image data, a raw feed directly from a sensor source.
Big Data ChallengeVolume How to process data so big that can not be move, or store.Velocity A lot of data coming very fast so it can not be stored such as Web usage log , Internet, mobile messages. Stream processing is needed to filter unused data or extract some knowledge real-time.Variety So many type of unstructured data format making conventional database useless.
How to deal with big data Integration of Storage Processing Analysis Algorithm Visualization Processin gMassive Processin Visualiz Data Stream g eStream processing Storage Processin g Analysis
HadoopHadoopis a platform for distributing computing problems across anumber of servers. First developed and released as open source byYahoo. Implements the MapReduce approach pioneered by Google in compiling its search indexes. Distributing a dataset among multiple servers and operating on the data: the “map” stage. The partial results are then recombined: the “reduce” stage.Hadooputilizes its own distributed filesystem, HDFS, which makesdata available to multiple computing nodesHadoopusage pattern involves three stages: loading data into HDFS, MapReduce operations, and retrieving results from HDFS.
WHAT FACEBOOK KNOWS Cameron Marlow calls himself Facebooks "in- house sociologist." He and his team can analyzehttp://www.facebook.com/data essentially all the information the site gathers.
Study of Human SocietyFacebook, in collaboration with the University ofMilan, conducted experiment that involved the entire social network as of May 2011 more than 10 percent of the worlds population.Analyzing the 69 billion friend connections amongthose 721 million people showed that four intermediary friends are usually enough to introduce anyone to a random stranger.
The links of LoveOften young women specify thatthey are “in a relationship” withtheir “best friend forever”. Roughly 20% of all relationships for the 15-and-under crowd are between girls. This number dips to 15% for 18-year- olds and is just 7% for 25-year-olds.Anonymous US users who wereover 18 at the start of therelationship the average of the shortest number of steps to get from any one U.S. user to any other individual is 16.7. This is much higher than the 4.74 steps you’d need to go from any Facebook user to another through friendship, as opposed to romantic, ties. Graph shown the relationship of anonymous US users who were over 18 at the start of the relationship. http://www.facebook.com/notes/facebook-data-team/the-links-of- love/10150572088343859
Why?Facebook can improve users experience make useful predictions about users behavior make better guesses about which ads you might be more or less open to at any given timeRight before Valentines Day this year a blog postfrom the Data Science Team listed the songs mostpopular with people who had recently signaled onFacebook that they had entered or left a relationship
How facebook handle Big Data? Facebook built its data storage system using open-source software called Hadoop. Hadoop spreading them across many machines inside a data center. Use Hive, open-source that acts as a translation service, making it possible to query vast Hadoop data stores using relatively simple code. Much of Facebooks data resides in one Hadoop store more than 100 petabytes (a million gigabytes) in size, says SameetAgarwal, a director of engineering at Facebook who works on data infrastructure, and the quantity is growing exponentially. "Over the last few years we have more than doubled in size every year,”
San Diego Supercomputer CenterUnleashes the Value of its User DataChallnege To make SDSC’ s data stores widely available so that they could be accessed, searched, and shared anywhere via Web-based access, SDSC made the decision to move from a tape-based system to cloud- based object storage.Solution OpenStack Object Storage uses open-source software to create redundant, scalable storage using clusters of standardized servers to store petabytes of accessible data. Objects are written to multiple hardware devices, with the OpenStack software responsible for ensuring data replication and integrity across the cluster. Storage clusters can scale horizontally by adding new nodes. Should a node fail, OpenStack replicates its content from other active nodes.Benefit Today, SDSCs Cloud Storage provides academic and research partners with a convenient and affordable way to store, share, and archive data, including extremely large data sets. Utilizing the OpenStack Object Storage software, files (objects) are written to multiple physical storage arrays simultaneously, ensuring that at least two verified copies exist on different servers at all times.
Cloud LibraryCloud Library e-book lending service thatwill allow users to browse and borrowdigital books directly from their iPads,Nooks and Android-based tablets.3M will outfit local libraries with its ownsoftware, hardware and e-book collection be able to access via special apps, or 3Ms new eReaders, which will be synced with available digital content. Discovery Terminal download stations in libraries, allowing visitors to leaf through the collection from a touch-based interface. Random House and IPG have signed on to the initiative
Moving KU Computer Engineering on the Cloud Introduction Department of Computer Engineering is one of the leading computer engineering in Thailand (23 years) Research and Education 30 faculty member 20-30 Ph.D students 50 Master, 120 MSIT, 400 Undergrad Mission Must support the teaching and research by providing server / network/ service infrastructure Driving toward mobile anytime anywhere infrastructure
Moving KU Computer Engineering on the Cloud Challenge and Opportunity Must provide a scalable and reliable infrastructure Servers, Storage Services Previously, a number of physical server has been used Getting old quickly, hard to maintain, a lot of space Consume a lot of power, cooling
Moving KU Computer Engineering on the Cloud Cloud is Solution For Server, use VM cloud (VMware) to consolidate all small server into a set of VM on only 5 machines Every lab, professor can request for VM for their use Can scale easily using more physical server Moving to centralize large storage using NAS/SAN storage cloud
Standard is neededIEEE Standards Association (IEEE-SA) has formed twonew Working Groups (WGs) around IEEE P2301 andIEEE P2302. IEEE P2301 is a cloud computing standards in critical areas such as application, portability, management, and interoperability interfaces, as well as file formats and operation conventions. IEEE P2302 defines essential topology, protocols, functionality, and governance required for reliable cloud-to-cloud interoperability and federation.
Trend Software as a ServiceFramework as a ServiceVirtualized Infrastructure Physical Infrastructure
Cloud computing open issuesPeople do not trust other to have their importantdata And why people trust your bank to have all their money?People do not trust that cloud provider canprovide a robust and secure environment How many time your system went down or being hacked compared to google or facebook? Do avrage company have better staff than ISP who deal with these problems on a daily basisInteresting!
ConclusionCloud Computing is here!You are using it everyday SaaS Level such as facebook, gmailLet fly above the cloud and see whatit can do for you.