For Ahmedabad Java Meetup Group (300+ members strong now!)
Big Data Workshop
– An introduction
and workshop launch session...
Welcome!
l
Why a workshop and not a presentation
l
What you should do in workshop?
l
What is expected from you in this ses...
Seems too serious?
Now, This is much better!
So, let's change the font!
OK... So what are we gonna do today?
➔Workshop setup and series introduction
➔Already done! (See it's easy!)
➔Big is not o...
Let me tell you a story..
http://en.wikipedia.org/wiki/Information_Management_System
If you still think about 'Entities' and 'Tables'
Everything you have been taught in college
about Database is ALL WRONG.
h...
Big Data is...
Big Data is not only ‘big’
Volume, Velocity, Variety
GB/TB vs PB/EB
Centralized vs Distributed
Structured vs Semi-Structur...
What 'Big data' is NOT?
Big data है इसलिलिए Hadoop हैँ , Hadoop हैँ इसललिए Big data नहिहं!
What 'Big data' is NOT?
Applying for a job here?
Hadoop सले कम तो गालिी के बराबर है !
What 'Big data' is NOT?
Why always Hadoop comes to mind with big
data?
What else we should know?
Tools vs Methodologies
Be...
Big Data in your organization
http://www.fakingnews.firstpost.com/2014/04/transcript-of-rahul-gandhis-interview-for-job-of...
Big Data in your organization
➢ Cost of tools/software decreases, but cost of
knowledge increases
➢ Being agile is the onl...
Big Data in your product/service
● Have to change thinking in perspective of access vs. storage
● Design based on when/whe...
Big Data in your project
Random Research says..
➔ 99% client of yours asked for Big Data
project, ended up having total pa...
Big Data for your clients
➢ Business first - technology second
➢ Current reality for client projects:
✔ Use big data tools...
Big Data project for small data customers
If you can do it postgresql, then do it postgresql
(the blue elephant rule)
Few important tips..
The CAP theorem- Basics of NoSQL Databases
Read a lot about design of database before
using any non traditional database. ...
Now... the good parts !
It's your time to speak now!
Workshop session:
About practical selection of technology and
design ...
All references used in workshop reference
➔ Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutori...
BigData Workshop Introduction Session - Ahmedabad Java Meetup
Upcoming SlideShare
Loading in …5
×

BigData Workshop Introduction Session - Ahmedabad Java Meetup

869 views

Published on

Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/
Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html
Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147
Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html
Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
Good big data compatible OSS softwares : http://netflix.github.io/
Practical Hbase usage : https://www.facebook.com/UsingHbase
Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes
On-line analytics in STORM : http://hortonworks.com/hadoop/storm/
E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/
Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791
High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/

Published in: Data & Analytics, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
869
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

BigData Workshop Introduction Session - Ahmedabad Java Meetup

  1. 1. For Ahmedabad Java Meetup Group (300+ members strong now!) Big Data Workshop – An introduction and workshop launch session May, 2014 Dhruv Gohil From Ishi systems
  2. 2. Welcome! l Why a workshop and not a presentation l What you should do in workshop? l What is expected from you in this session l What you should expect from this session? l What are up-coming sessions going to be like?
  3. 3. Seems too serious? Now, This is much better! So, let's change the font!
  4. 4. OK... So what are we gonna do today? ➔Workshop setup and series introduction ➔Already done! (See it's easy!) ➔Big is not only ‘big’. ➔Why we need 'Big data'? ➔What 'Big data' is NOT? ➔fear of Big data? Kick it off!
  5. 5. Let me tell you a story.. http://en.wikipedia.org/wiki/Information_Management_System
  6. 6. If you still think about 'Entities' and 'Tables' Everything you have been taught in college about Database is ALL WRONG. http://slideshot.epfl.ch/play/suri_stonebraker
  7. 7. Big Data is...
  8. 8. Big Data is not only ‘big’ Volume, Velocity, Variety GB/TB vs PB/EB Centralized vs Distributed Structured vs Semi-Structured/Unstructured Data Model vs Schema Known relationships vs Flexible associations
  9. 9. What 'Big data' is NOT? Big data है इसलिलिए Hadoop हैँ , Hadoop हैँ इसललिए Big data नहिहं!
  10. 10. What 'Big data' is NOT? Applying for a job here? Hadoop सले कम तो गालिी के बराबर है !
  11. 11. What 'Big data' is NOT? Why always Hadoop comes to mind with big data? What else we should know? Tools vs Methodologies Being too futuristic vs. being practical/economical
  12. 12. Big Data in your organization http://www.fakingnews.firstpost.com/2014/04/transcript-of-rahul-gandhis-interview-for-job-of-a-c-programmer/ We brought RTSC. Right To Source Code. Now, deal with it.
  13. 13. Big Data in your organization ➢ Cost of tools/software decreases, but cost of knowledge increases ➢ Being agile is the only way to deal competition ➢ Are you working with... ✔ Social networking and media ✔ Mobile devices ✔ Internet transactions ✔ Networked devices and sensors
  14. 14. Big Data in your product/service ● Have to change thinking in perspective of access vs. storage ● Design based on when/where data is used vs. when/where data is produced. ● Use redundancy in contrast of storage cost ● Understand NoSQL = Not Only SQL ✔ Streams ✔ In memory analytics ✔ Massively parallel processing (Data crunching)
  15. 15. Big Data in your project Random Research says.. ➔ 99% client of yours asked for Big Data project, ended up having total paid customers less then your own fingers. A Project hits Business scalability much much earlier then technical scalability.
  16. 16. Big Data for your clients ➢ Business first - technology second ➢ Current reality for client projects: ✔ Use big data tools which works at small scale :-) ✔ Design with domain in mind not the database client suggests. ➢ Always design for read optimization in mind (the golden rule)
  17. 17. Big Data project for small data customers If you can do it postgresql, then do it postgresql (the blue elephant rule)
  18. 18. Few important tips..
  19. 19. The CAP theorem- Basics of NoSQL Databases Read a lot about design of database before using any non traditional database. Or read good negative posts to know when NOT to use it. e.g. : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
  20. 20. Now... the good parts ! It's your time to speak now! Workshop session: About practical selection of technology and design for real word use cases.
  21. 21. All references used in workshop reference ➔ Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/ ➔ Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html ➔ Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147 ➔ Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html ➔ Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/ ➔ Good big data compatible OSS softwares : http://netflix.github.io/ ➔ Practical Hbase usage : https://www.facebook.com/UsingHbase ➔ Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes ➔ On-line analytics in STORM : http://hortonworks.com/hadoop/storm/ ➔ E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376 ➔ Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/ ➔ Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791 ➔ High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/

×