The Power of Big Data
Upcoming SlideShare
Loading in...5
×
 

The Power of Big Data

on

  • 629 views

Safely exploiting the power of data and information is vital to your business. In this seminar, Iain, Tim and David will look at the advantages and risks presented to your business by 'Big Data' and ...

Safely exploiting the power of data and information is vital to your business. In this seminar, Iain, Tim and David will look at the advantages and risks presented to your business by 'Big Data' and its potential for data analysis.

They will investigate a number of uses of NoSQL databases to support this analysis, and will discuss other business considerations including how to distribute data across multiple locations, and significantly reduce potential cost of storage.

The seminar concludes with an overview of modern ways of securing your data assets, including different authentication methods and encryption.

Statistics

Views

Total Views
629
Views on SlideShare
624
Embed Views
5

Actions

Likes
0
Downloads
2
Comments
0

3 Embeds 5

http://www.linkedin.com 3
http://localhost 1
http://www.spundge.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • -There’s been a data revolution – its everywhere-Huge variety of sources -News Aggregators, Social Media, Search Engines, Internet -New Government initiatives (data.gov.uk) -Public data -Geographical, weather, transport, literature, historical records-You have more than you think: -Customer Information -Sales and financial data -Employee data -Stock -Intellectual property -Logistics -Sensors (Internet of Things)-Biggest Asset-Has monetary value -Can be used in a huge variety of ways to improve business
  • -Good Question-As many definitions as people you ask-Jonathan Ward + Adam Barker + St Andrews University did a study-Asked big venders-Oracle “relational + unstructured data combined for Business Intelligence”-Microsoft “applying Artificial Intelligence and distributed computing to large datasets”-Most definitions technology focused-Vague-Study concluded with *click for quote
  • -Huge amounts of data nowadays-Need new techniques to analyse and store itDid you know:-90% of data was generated in the last 2 years-2.5 Exabytes (more than 2.7 trillion megabytes OR 17,179,869 iPod classics - 160gb)-Data analysis on a huge scale
  • -Healthcare: Predict trends in diseases and effectiveness of treatment, e.g. UK Biobank – collected medical, lifestyle and geographical data of 500,000 people to find what causes developments of major diseases, and the effectiveness of different treatments on them-Scientific Research: Folding@Home and SETI@Home-Market Research: -Billion Prices Project MIT -Twitter Sentiment -Google Analytics-Business Operation Analysis and Optimisation: -Tesco predict stock -FedEx package tracking and logistics optimisation -Amazon stock layout optimisation-Advertising:GoogleAdWords and Facebook Ad Audiences Hadoop/MapReduceNoSQLDistributed Computing and Virtualisation
  • Hardware/expertise expensiveFew Big Data specialists (but growing)Not always the right tool (do you have “BIG DATA”?)Causation: Remember the pirates?...
  • Key Points:You have more data than you thinkYou can do a lot more with it than you thinkSo:-Gather data on you and your customers-Use the analytical approach of Big Data to make informed business decisions
  • Bill Clinton in office, AyrtonSenna died in an accident during the San Marino Grand Prix, China got its first connection to the internet and The Lion King was released into the cinema.
  • To guarantee reliability.ACID Atomicity requires that each transaction is "all or nothing”.The consistency property ensures that any transaction will bring the database from one valid state to another.The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other.Durability means that once a transaction has been committed, it will remain so.BASE (Basically Available, Soft state, Eventual consistency)
  • Twitter: Apache Cassandra - Our geo team uses it to store and query their database of places of interest. The research team uses it to store the results of data mining done over our entire user base. Those results then feed into things like @toptweets and local trends. Our analytics, operations and infrastructure teams are working on a system that uses cassandra for large-scale real time analytics for use both internally and externally.Facebook: Hbase – Messaging platform introduced at the end of 2010. Can deal with very high throughputs.BBC: CouchDB -The BBC is building a new environment that allows cost-effective building of dynamic content platforms.Theguardian: MongoDB - Storage of articles.NASA: Allegrograph - Storing assets and being able to provide a meaningful search through links between a variety of different kinds of assets from software to drawings to documents to employee skills.
  • This is a very simple method of storing data.Unique key and data associated with the key.Inherent expectation of being distributed over many machines -> Highly available data stores, minimal downtime.
  • Data is stored by column, rather than by row. Ideal for sparsely populated databases. Large reductions in the storage requirements.Very good for finding aggregate values.PIVOT TABLE!!!Use for your business, a single database containing anything purchased by your company over the past year. Quickly do analysis, such as average spend per employee or total amount spent by a particular department. Rather than pull in selected data from all over your business into a data warehouse in order to do analysis, you could store all the varied information in a single data store and run all kinds of analysis.
  • Data is stored by column, rather than by row. Ideal for sparsely populated databases. Large reductions in the storage requirements.Very good for finding aggregate values.PIVOT TABLE!!!Use for your business, a single database containing anything purchased by your company over the past year. Quickly do analysis, such as average spend per employee or total amount spent by a particular department. Rather than pull in selected data from all over your business into a data warehouse in order to do analysis, you could store all the varied information in a single data store and run all kinds of analysis.
  • Data is stored by column, rather than by row. Ideal for sparsely populated databases. Large reductions in the storage requirements.Very good for finding aggregate values.PIVOT TABLE!!!Use for your business, a single database containing anything purchased by your company over the past year. Quickly do analysis, such as average spend per employee or total amount spent by a particular department. Rather than pull in selected data from all over your business into a data warehouse in order to do analysis, you could store all the varied information in a single data store and run all kinds of analysis.
  • Data is stored by column, rather than by row. Ideal for sparsely populated databases. Large reductions in the storage requirements.Very good for finding aggregate values.PIVOT TABLE!!!Use for your business, a single database containing anything purchased by your company over the past year. Quickly do analysis, such as average spend per employee or total amount spent by a particular department. Rather than pull in selected data from all over your business into a data warehouse in order to do analysis, you could store all the varied information in a single data store and run all kinds of analysis.
  • A more holistic approach to storing data.Documents with varying data and structures can be kept together.You are not penalised as your business grows and your data model changes.Great for storing your business assets.FILING CABINET.Get a picture of a medical store.
  • A more holistic approach to storing data.Documents with varying data and structures can be kept together.You are not penalised as your business grows and your data model changes.Great for storing your business assets.FILING CABINET.Get a picture of a medical store.
  • A more holistic approach to storing data.Documents with varying data and structures can be kept together.You are not penalised as your business grows and your data model changes.Great for storing your business assets.FILING CABINET.Get a picture of a medical store.
  • Where the links between data (edges) become as important as the data itself (nodes). Specialised data stores, particularly suited to social networks. If it is important to know exactly the relationship between one piece of data and another, this may be the solution to your problem.Inherent value in the links  state can change.Unknown link.
  • Distribute your database over a number (lower cost) machines -> ‘always on’ solution. Reduces downtime and, hence, risk.
  • Id Quantique – Swiss Company developed a machine which was used in the Swiss parliamentary election which was used to securely pass results of the election. This was done by using quantum cryptography.It works by using a technique called Quantum Key Distribution (QKD). QKD enables two parties to produce a shared random secret key which is only known to them. They can then use this key to encrypt and decrypt messages passed between those parties.Keys are generated by using Photos, which are produced using LEDS. These Photons are then polarised using polarising filtersAn important property of Quantum cryptography is the ability to detect the presence of a third party who is attempting to eavesdrop on the transmission of the secret key, thus being able to encrypt and decrypt messages themselves.However, a fundamental principle of quantum mechanics – the process of measuring a quantum system in general disturbs the system.

The Power of Big Data The Power of Big Data Presentation Transcript

  • The Power of Big Data Tim Wiles, Iain Batty and David Turnbull 31 January 2014
  • Outline • Big data Iain Batty • NoSQL: The future of data storage? Tim Wiles • Data security David Turnbull
  • Big Data
  • What is Data? • Data is everywhere • You have more than you think • It’s your biggest asset
  • So What Is “Big” Data? • Many Definitions • Study by Ward & Barker of St Andrews • “Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, Map Reduce and machine learning.”
  • So What Is “Big” Data? • We have a huge amount of data: – 90% of data was created in the last two years – 2.5 Exabyte's (2.5×1018) of data created every day • Data Analysis on a huge scale
  • THANKS TO BIG DATA…
  • How and Why Big Data is Used • Healthcare • Scientific Research (Folding@Home, SETI) • Market Research • Business Operation Optimisation
  • Why to use Big Data • Investigative and Predictive • Increasing amount of public data access • Enables high level understanding of previously unfathomable datasets
  • Why Not to Use Big Data • Expensive • Limited Pool of talent • Not always applicable • Must be used correctly: Correlation does not mean causation ...yaaaarrrr?!
  • Conclusion • Big Data technologies may or may not be right for you • But the principles are universal: – Gather your data – Use novel new sources such as Social Media and public data initiatives – Analyse it intelligently
  • NoSQL: The future of data storage?
  • 20 years ago… Hard drives ~ 500 MB Floppy disks ~ 1.44 MB Modems ~ 28-56 Kbps Digital cameras emerging BBC front page (1996): bit.ly/Kc6ojz
  • Today… BBC front page (today): bbc.in/18lsxlx
  • Data storage Relational (SQL) NoSQL Highly structured Flexible structure Single type Many types £ ££££
  • Which horse do you back?
  • vs VHS Betamax
  • vs HD-DVD Blu-ray
  • Flavours of NoSQL Amazon Dynamo HBase Key-value Column Apache Cassandra Google BigTable CouchDB AllegroGraph Document Graph MongoDB Neo4j
  • Comparing the options • Right tool for the job? • Relational database → Can be adapted. • NoSQL database → Specialised problem solving.
  • Relational Database EmployeeID Employee PayID Payment Method 1 Tim Wiles 1 Salaried 2 Iain Batty 2 Ad Hoc 3 David Turnbull 3 Digestive Biscuits EmployeeID PayID 1 3 2 1 3 1
  • Key-value stores Key Value teh the hlelo hello edn end tol tool …
  • Column stores Item Name Number Of Sales Total Cost (£) Total Revenue (£) Origin Orange Juice 152,000 76,000 152,000 Spain Apple Juice 137,000 54,800 123,300 UK Pineapple Juice 63,000 37,800 78,750 Brazil Grape Juice 84,000 46,200 92,400 Spain
  • Column stores Item Name Number Of Sales Total Cost (£) Total Revenue (£) Origin Orange Juice 152,000 76,000 152,000 Spain Apple Juice 137,000 54,800 123,300 UK Pineapple Juice 63,000 37,800 78,750 Brazil Grape Juice 84,000 46,200 92,400 Spain = 436,000
  • Column stores Item Name Number Of Sales Total Cost (£) Total Revenue (£) Origin Orange Juice 152,000 76,000 152,000 Spain Apple Juice 137,000 54,800 123,300 UK Pineapple Juice 63,000 37,800 78,750 Brazil Grape Juice 84,000 46,200 92,400 Spain
  • Column stores Item Name Number Of Sales Total Cost (£) Total Revenue (£) Origin Orange Juice 152,000 76,000 152,000 Spain Apple Juice 137,000 54,800 123,300 UK Pineapple Juice 63,000 37,800 78,750 Brazil Grape Juice 84,000 46,200 92,400 Spain Profit = £231,650
  • Document stores
  • Document stores
  • Document stores Company Location 1 City: Durham Employee List Employee 1 Name: Tim Wiles Age: 26 Location 2 City: London Employee 2 Start Date: 31/03/2013 Name: David Turnbull Age: 27
  • Graph stores Enemy “Friend”
  • Case Study: Middle Earth University Introduction to Alchemy Wed 11AM Advanced Alchemy Wed 1PM World Domination Wed 9AM Introduction to Magic Wed 11AM Advanced Magical Techniques Wed 9AM
  • Case Study: Middle Earth University Advanced Magical Techniques Wed 9AM
  • Case Study: Middle Earth University Introduction to Alchemy Wed 11AM Advanced Alchemy Wed 1PM World Domination Wed 9AM Introduction to Magic Wed 11AM Advanced Magical Techniques Wed 9AM
  • Case Study: Middle Earth University Introduction to Alchemy Wed 11AM All courses running at 11AM on Wednesday Introduction to Magic Wed 11AM
  • Case Study: Middle Earth University Introduction to Alchemy Wed 11AM Advanced Alchemy Wed 1PM World Domination Wed 9AM Introduction to Magic Wed 11AM Advanced Magical Techniques Wed 9AM
  • Case Study: Middle Earth University Introduction to Alchemy Wed 11AM Advanced Alchemy BMag Wed 1PM World Domination Wed 9AM Introduction to Magic Wed 11AM Advanced Magical Techniques Wed 9AM MMag DMag
  • Case Study: Middle Earth University Advanced Alchemy Wed 1PM DMag Advanced Magical Techniques Wed 9AM
  • Case Study: Middle Earth University Advanced Alchemy Wed 1PM DMag Advanced Magical Techniques Wed 9AM
  • Case Study: Middle Earth University Introduction to Alchemy Shire Lecture Hall Wed 11AM Advanced Alchemy Wed 1PM Mordor Seminar Room World Domination Wed 9AM BMag Introduction to Magic Wed 11AM MMag Advanced Magical Techniques Wed 9AM DMag
  • Case Study: Middle Earth University Introduction to Alchemy Wed 11AM Advanced Alchemy Wed 1PM Mordor Seminar Room
  • Is NoSQL for everyone? • Most businesses functioning effectively using only relational databases. • Not the grand solution to all data storage problems. • Train or employ → NoSQL knowledge.
  • However…
  • NoSQL is showing significant promise for certain aspects of almost any business.
  • Reasons to use NoSQL in your business • Potential significant financial savings. • Easy to adapt stored data as your business grows and your priorities change. • Exceeding the performance of popular commercial relational databases.
  • Reasons to use NoSQL in your business Effective tool for a holistic approach to analysing the growth/status of your business.
  • Reasons to use NoSQL in your business Relational databases are not the only solution to your data storage problems.
  • Data Security
  • Why Is Data Security Important • The cost of a data breach is continuing to rise • Fewer customers remain loyal after a data breach • Reputation losses and diminished goodwill – lost business cost has steadily increase over the last 6 years (£500 thousand in 2007) • Malicious or criminal attacks are the most costly
  • The Cost Of a Data Breach 2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013]
  • The Cost Of a Data Breach 2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013]
  • The Cost Of a Data Breach 2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013]
  • The Causes Of a Data Breach 2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013]
  • Current Methods Of Authentication 1. Basic User Name and Passwords 2. Biometrics • • • • Fingerprint Scanners Voice recognition Face scanning and recognition Retina and iris scans 3. Multi-Factor Authentication • Something possessed, as in a physical token or telephone • Something known, such as a password or mother’s maiden name • Something inherent, like a biometric trait
  • Pros and Cons Of These Methods 1. Standard Username and Password authentication is extremely vulnerable to Rainbow Attacks 2. Relies on the ability of the system users to pick secure passwords Adobe Crossword
  • Pros and Cons Of These Methods • In theory biometrics is a great way to authenticate a user. Its impossible to lose your finger prints, unless you have both your hands chopped off.
  • The Best Solution • Multi-factor Authentication. A security measure that requires two or more kinds of evidence that you are who you say you are. • Authentication requires a combination of these bits of evidence rather than simply using one or the other. • Something you know – Username, Password • Something you have – An RSA Key, Credit Card • Something inherent – A fingerprint, retina scan • Multi-factor Authentication is very secure, but it is hard to implement everywhere. • Requires users to remember to carry their RSA keys with them.
  • Emerging Methods Of Authentication • YubiKey – Authentication method based on a unique physical token which cannot be duplicated or recorded, providing a credential based on something only an authorised user possesses. • Can also be used with password managers such as LastPass
  • How Does YubiKey Work?
  • Quantum Cryptography What is Quantum Cryptography? The use of quantum mechanical effects to perform cryptographic tasks or to break cryptographic systems What does that mean exactly? • Using physics rather than mathematics to perform cryptographic tasks, such as generating cryptographic keys • Moreover Quantum Cryptography addresses the problem of Key distribution
  • Quantum Cryptography How does it work? • It works by using a technique called Quantum Key Distribution (QKD). QKD enables two parties to produce a shared random secret key which is only known to them. They can then use this key to encrypt and decrypt messages passed between those parties. • Keys are generated by using photons, which are produced using LEDS. These photons are then polarised using polarising filters and then transmitted • The two parties decide on what filters are going to be used, and also assign a value, usually a binary value to each photon which has a certain polarisation • When the whole transmission has happened a unique key has been produced
  • Quantum Cryptography What is the benefits of using Quantum Cryptography? • An important property of quantum cryptography is the ability to detect the presence of a third party attempting to eavesdrop on the transmission of the secret key • This is achieved because of a fundamental principle of quantum mechanics – the process of measuring a quantum system in general disturbs the system.
  • Questions
  • References 1. http://www.technologyreview.com/view/519851/th e-big-data-conundrum-how-to-define-it/ 2. http://en.wikipedia.org/wiki/Big_data#cite_note-15 3. 2013 Cost Of Data Breach Study: United Kingdom [Ponemon Institute, May 2013] 4. http://www.yubico.com/products/yubikeyhardware/yubikey/technical-description/ 5. https://wiki.archlinux.org/index.php/yubikey#How_ does_it_work
  • Upcoming Seminars • Capturing the Real Value of IT Service Management- Friday14th February • Preparing for BYOD & Mobile Device Management- Friday 28th February