• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
BigData Workshop Introduction Session - Ahmedabad Java Meetup
 

BigData Workshop Introduction Session - Ahmedabad Java Meetup

on

  • 216 views

Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/ ...

Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/
Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html
Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147
Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html
Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
Good big data compatible OSS softwares : http://netflix.github.io/
Practical Hbase usage : https://www.facebook.com/UsingHbase
Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes
On-line analytics in STORM : http://hortonworks.com/hadoop/storm/
E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/
Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791
High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/

Statistics

Views

Total Views
216
Views on SlideShare
216
Embed Views
0

Actions

Likes
1
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    BigData Workshop Introduction Session - Ahmedabad Java Meetup BigData Workshop Introduction Session - Ahmedabad Java Meetup Presentation Transcript

    • For Ahmedabad Java Meetup Group (300+ members strong now!) Big Data Workshop – An introduction and workshop launch session May, 2014 Dhruv Gohil From Ishi systems
    • Welcome! l Why a workshop and not a presentation l What you should do in workshop? l What is expected from you in this session l What you should expect from this session? l What are up-coming sessions going to be like?
    • Seems too serious? Now, This is much better! So, let's change the font!
    • OK... So what are we gonna do today? ➔Workshop setup and series introduction ➔Already done! (See it's easy!) ➔Big is not only ‘big’. ➔Why we need 'Big data'? ➔What 'Big data' is NOT? ➔fear of Big data? Kick it off!
    • Let me tell you a story.. http://en.wikipedia.org/wiki/Information_Management_System
    • If you still think about 'Entities' and 'Tables' Everything you have been taught in college about Database is ALL WRONG. http://slideshot.epfl.ch/play/suri_stonebraker
    • Big Data is...
    • Big Data is not only ‘big’ Volume, Velocity, Variety GB/TB vs PB/EB Centralized vs Distributed Structured vs Semi-Structured/Unstructured Data Model vs Schema Known relationships vs Flexible associations
    • What 'Big data' is NOT? Big data है इसलिलिए Hadoop हैँ , Hadoop हैँ इसललिए Big data नहिहं!
    • What 'Big data' is NOT? Applying for a job here? Hadoop सले कम तो गालिी के बराबर है !
    • What 'Big data' is NOT? Why always Hadoop comes to mind with big data? What else we should know? Tools vs Methodologies Being too futuristic vs. being practical/economical
    • Big Data in your organization http://www.fakingnews.firstpost.com/2014/04/transcript-of-rahul-gandhis-interview-for-job-of-a-c-programmer/ We brought RTSC. Right To Source Code. Now, deal with it.
    • Big Data in your organization ➢ Cost of tools/software decreases, but cost of knowledge increases ➢ Being agile is the only way to deal competition ➢ Are you working with... ✔ Social networking and media ✔ Mobile devices ✔ Internet transactions ✔ Networked devices and sensors
    • Big Data in your product/service ● Have to change thinking in perspective of access vs. storage ● Design based on when/where data is used vs. when/where data is produced. ● Use redundancy in contrast of storage cost ● Understand NoSQL = Not Only SQL ✔ Streams ✔ In memory analytics ✔ Massively parallel processing (Data crunching)
    • Big Data in your project Random Research says.. ➔ 99% client of yours asked for Big Data project, ended up having total paid customers less then your own fingers. A Project hits Business scalability much much earlier then technical scalability.
    • Big Data for your clients ➢ Business first - technology second ➢ Current reality for client projects: ✔ Use big data tools which works at small scale :-) ✔ Design with domain in mind not the database client suggests. ➢ Always design for read optimization in mind (the golden rule)
    • Big Data project for small data customers If you can do it postgresql, then do it postgresql (the blue elephant rule)
    • Few important tips..
    • The CAP theorem- Basics of NoSQL Databases Read a lot about design of database before using any non traditional database. Or read good negative posts to know when NOT to use it. e.g. : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
    • Now... the good parts ! It's your time to speak now! Workshop session: About practical selection of technology and design for real word use cases.
    • All references used in workshop reference ➔ Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/ ➔ Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html ➔ Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147 ➔ Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html ➔ Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/ ➔ Good big data compatible OSS softwares : http://netflix.github.io/ ➔ Practical Hbase usage : https://www.facebook.com/UsingHbase ➔ Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes ➔ On-line analytics in STORM : http://hortonworks.com/hadoop/storm/ ➔ E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376 ➔ Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/ ➔ Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791 ➔ High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/