The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media). Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage. More detailed structured data New unstructured data Device-generated data But big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as: Scale out storage MPP database architectures Hadoop and the Hadoop ecosystem In-database analytics In-memory computing Data virtualization Data visualization
Big data on the CloudDr. Putchong UthayopasDepartment of Computer Engineering, Faculty ofEngineering, Kasetsart University.Email: firstname.lastname@example.org
We are living in the world of Data Video Surveillan ce Social MediaMobileSensors Smart Grids Geophysi Medical Imaging Gene Sequencing cal Explorati on
Why now?• Internet create an ability to gather all data together at the scale never be seen before. – Data from human – Data from Sensor• Crowd Sourcing is now being practice – User generated data is flooding the world• New device and tools make it easy to generate data
Big Data“Big data is data that exceeds the processingcapacity of conventional database systems. Thedata is too big, moves too fast, or doesn’t fit thestrictures of your database architectures. To gainvalue from this data, you must choose analternative way to process it.” Reference: “What is big data? An introduction to the big data landscape.”, Edd Dumbill, http://radar.oreilly.com/2012/01/what-is-big-data.html
Amazon View of Big Data Big data refers to a collection of tools, Big data refers to a collection of tools,techniques and technologies which make ititeasy techniques and technologies which make easyto work with data at any scale. These distributed, to work with data at any scale. These distributed, scalable tools provide flexible programming scalable tools provide flexible programming models to navigate and explore data of any models to navigate and explore data of any shape and size, from a variety of sources. shape and size, from a variety of sources.
Information as an Asset• Cloud will enable larger and larger data to be easily collected and used• People will deposit information into the cloud – Bank, personal ware house• New technology will emerge – Larger and scalable storage technology – Innovative and complex data analysis/visualization for multimedia data – Security technology to ensure privacy• Cloud will be mankind intelligent and memory!
Google Cloud Platform• App engines – mobile and web app• Cloud SQL – MySQL on the cloud• Cloud Storage – Data storage• Big Query – Data analysis• Google Compute Engine – Processing of large data
Amazon• Amazon EC2 – Computation Service using VM• Amazon DynamoDB – Large scalable NoSQL databased – Fully distributed shared nothing architecture• Amazon Elastic MapReduce (Amazon EMR) – Hadoop based analysis engine – Can be used to analyse data from DynamoDB
Trends• A move toward large and scalable Virtual Infrastructure – Providing computing service – Providing basic storage service – Providing Scalable large database • NOSQL – Providing Analysis Service• All these services has to come together – Big data can not moved!
Issues• Security – Will you let an important data being accumulate outside your organization? • If it is not an important data, why analyze them ? – Who own the data? If you discontinue the service, is the data being destroy properly. – Protection in multi-tenant environment• Big data can not be moved easily – Processing have to be near. Just can not ship data around • So you finally have to select the same cloud for your processing. Is it available, easy, fast?• New learning, development cost – Need new programming, porting? – Tools is mature enough?
When to use Big data on the Cloud• When data is already on the cloud – Virtual organization – Cloud based SaaS Service• For startup – CAPEX to OPEX – No need to maintain large infra – Focus on scalability and pay as you go – Data is on the cloud anyway• For experimental project – Pilot for new services
Summary• Big data is coming. – Big data are being accumulate anyway – Knowledge is power. • Better understand your customer so you can offer better service• Tools and Technology is available – Still being developed fast• Cloud is coming, why not doing big data on the cloud – Probably not today but soon