Your SlideShare is downloading. ×
Whynosql
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Whynosql

1,595
views

Published on

Talk to techmeetup Aberdeen on bigdata and nosql …

Talk to techmeetup Aberdeen on bigdata and nosql
Some links seem to be missing from the onscreen presentation, particularly http://www.dbshards.com/dbshards/ for the sharding diagram

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,595
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • http://www.greenbookblog.org/2012/03/21/big-data-opportunity-or-threat-for-market-research/
  • http://news.softpedia.com/news/Twitpocalypse-039-s-Aftermath-114084.shtml
  • http://www.dbshards.com/dbshards/
  • http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changedPicture: http://www.datacenterknowledge.com/archives/2009/11/04/inside-a-cloud-computing-data-center/
  • Larryeleison must be mad that his “free” software mysql is used on the biggest website in the world.
  • create keyspace test with strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor=1;use test;create columnfamily users (KEY varchar Primary key, password varchar, gender varchar);INSERT INTO users (KEY, password) VALUES ('jsmith', 'ch@ngem3a');Select * from users;INSERT INTO users (KEY, gender) VALUES ('jbrown', 'male');INSERT INTO users (KEY, phone) VALUES ('jbrown', '01382 345078');What are we going to get ?
  • Transcript

    • 1. Andy CobleySchool of ComputingUniversity of DundeeTwitter: @andycobley
    • 2. Who am I ? Lecturer at University of Dundee Program director of Business Intelligence and new program Data Science (http://goo.gl/ljl6N and http://goo.gl/uwHSi ) Geek and Hacker
    • 3. So what is Big Data?
    • 4. From evil Wikipedia “In information technology, big data[1] consists of datasets that grow so large that they become awkward to work with using on-hand database management tools.” Which doesn’t tell us much Any definition that relies on data “size” will become obsolete very quickly as data storage capabilities grows.
    • 5. Lets try something different  The Three V’s  Volume  How Big is the data, Terabytes ? Petabytes?  Variety  Is it the same sort of data, what about blobs ? Does it change ?  Velocity  How fast is it coming in ? Can we store it fast enough and then use it ?http://nosql.mypopescu.com/post/5547192335/bigdata-the-three-vs-volume-variety-velocity
    • 6. The Twitter problem Twitpocalypse Overflow of status ids for 32 bit signed integers But beyond that, can we physically store data fast enough ?
    • 7.  Suppose we are storing 16 columns of 16 bytes At 100 per second 0.7 Terabyte per year Add at 1 million per second that’s 7 petabytes per year This is volume
    • 8. Variability Data is sparse and can be different sizes Over time the type of data changes Consider click through data, as pages evolve new data types and fields need to be stored
    • 9. What aboutid MassSpec Meta data Meta data12
    • 10. We need UDF User Defined functions inside the dB Or a different way of dealing with it, such as Hadoop or MRSQL.
    • 11. So what is NoSql Throws away everything you know about Databases Is a family of different databases Lots of different “products” BUT ! http://nosql.mypopescu.com/post/1016320617/mongo db-is-web-scale (warning might offend) They should only be used when it’s sensible, they are not magic sauce.
    • 12. NoSql types Key-Value Column-family Document databases Allow sharding across nodes Graph  Fast for graph like data and operations
    • 13. Some NoSQL databases CouchDb MongoDb Cassandra Riak Hbase Neo4j http://kkovacs.eu/cassandra-vs-mongodb-vs- couchdb-vs-redis
    • 14. Sharding ? Distribution of data across nodes Allows performance to be spread across multiple machines SQL databases can be sharded Not all NoSQL databases can be sharded
    • 15. Cap Theorem  CAP (or Brewers) theorem says:  It’s impossible for a web service to provide the following  Consistency  Availability  Partition tolerancehttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdfBut see : http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed and http://codahale.com/you-cant-sacrifice-partition-tolerance/
    • 16. http://blog.nahurst.com/visual-guide-to-nosql-systems
    • 17. Partitions ? Essentially failing to achieve consistency within a set time causes a partition. You can sacrifice availability to ensure consistency Partitions are rare and if you have one server, almost never happen Partitions are caused by networks, failed nodees
    • 18. Eventual Consistency Eventually all nodes will tell the same story Isn’t this a mad idea ? Facebook (Actually not) The Internet is based on and Eventual Consistency dB DNS
    • 19. Introducing Cassandra Distributed / Decentralized Column Orientated Key Value Store Fault Tolerant
    • 20. Network topology of a Cassandradb Multiple nodes Cassandra can be Rack Aware Keys are replicated across nodes It’s essentially a DHT Distributed Hash Table Think BitTorrent
    • 21. CQL Version 8 introduced CQL Cassandra Query Language Almost looks like SQL ! http://crlog.info/2011/09/17/cassandra-query- language-cql-v2-0-reference/ Language ref http://www.datastax.com/docs/0.8/dml/using_cql
    • 22. Demo Start Cassandra Open CQLSH Create Keyspace Create a columnfamily Now we can insert !
    • 23. So why does this work ? Jsmith  Password: ch@ngem3a Jbrown  Gender: Male  Phone: 01382 345078Column store, keys with name: value pairs underneath
    • 24. Interfacing to Cassandra Based on Thrift  http://thrift.apache.org/ Large number of Languages supported  http://wiki.apache.org/cassandra/ClientOptions I’ve used Java and Hector  http://prettyprint.me/ Although there is a Csharp version  http://hectorsharp.com/
    • 25. Cassandra JDBC Very new, difficult to know how stable it is Needs compiling and libraries not in Cassandra !http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/
    • 26. Astyanax From Netflix Based on Hector but said to be a lot simpler! https://github.com/Netflix/astyanax/wiki
    • 27. jBloggyAppy a demo app ofCassandra All Source code on Github https://github.com/acobley/jBoggyAppy Feel free to use and abuse Simple blogging App
    • 28. A word on using OpenSourcesoftware Versioning ! Things Change ! Documentation is wrong !  http://prettyprint.me/ End up reading unit tests to actually program.
    • 29. One Last thing Dundee DDD 17th November , Big Data track Anyone interested in speaking ?