Your SlideShare is downloading. ×
The Term Bigdata stems from Characterisized
by 5V:
Volume: Large Volume of data
Velocity: amount of data per seconds
Varia...
Sources Example by 5V.
Volume: Youtube, large volume of video feeds received
and maintained at many video sites like youtu...
Batch Processing Vs Real Time processing
Batch Jobs run at particular time of day like Nightly jobs
or morning jobs which ...
Problems differentiated by 5V.
Velocity: With large volume of data received and quick turn
around latency required to refl...
Technology Company Open Sourced On
Cassandra DataStax Apache Cassandra
used by Facebook , Linkedin ,
Twitter
BigTable Goog...
Category No SQL database
Column
Oriented
Accumulo, Cassandra, Hbase.
Document Clusterpoint,Couchdb, Couchbase, MarkLogic, ...
Go through the link below:
http://sandyclassic.wordpress.com/2013/07/02/data-
warehousing-business-intelligence-and-cloud-...
As we know now Bigdata is solving problems of 5V like
the huge (V)olume of storage required for video sites
like youtube. ...
Map-Reduce Algorithm was starting point of All we see
in BigData created by Google researcher.
Mapper divides work into mu...
So Since data is mostly unstructured the best way to
analyze unstructured data is using Analytics here
Comes New Career Ca...
One application of Bigdata has been to gather
feedback about product from social media.
Here is Sample project Report belo...
Hadoop allows to distribute load among many
clusters.
There can be Database clusters, OS clusters,
Application Web server ...
‘R’ was open source Statistical Analysis language
having Statistical Constructs available used for
Analysis of data.
Java ...
Data Science http://thedatascience.wordpress.com/
Big Data :http://thebigdatatrends.wordpress.com
Data Science Blog2:
http...
Retail generates huge amount of data for product
positioned on different shelf at store, replenishment level,
reorder leve...
Retail uses lots of Sensors for tracking items with
warehouse and inside Store. The Huge real time data
(video , text and ...
Finance being Game of numbers huge data from Book
of accounts, P&L, Balance sheets of etc accumulates of
different busines...
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
Upcoming SlideShare
Loading in...5
×

Big data technologies with Case Study Finance and Healthcare

430

Published on

Big Data
Hadoop
NoSQL databases and type: column oriented,document oriented, map based.
Map-reduce Example
Bigdata Analytics Case study
Case Study R
Retail and Finance Case Study

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
430
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big data technologies with Case Study Finance and Healthcare"

  1. 1. The Term Bigdata stems from Characterisized by 5V: Volume: Large Volume of data Velocity: amount of data per seconds Variability: level of unintentional modification affecting data Quality throughout lifecycle of data. Value: Value derived from data. Variety: large range of data which is received from video , audio, text, image.
  2. 2. Sources Example by 5V. Volume: Youtube, large volume of video feeds received and maintained at many video sites like youtube, vimeo etc… Variety: Large variety of data text, audio, video, images, received in sites like facebook, twitter, other social media platforms. Velocity: Speed at which data is received in sites like twitter, facebook (1 billion people all feeding there data on one site)
  3. 3. Batch Processing Vs Real Time processing Batch Jobs run at particular time of day like Nightly jobs or morning jobs which depends on slack time When server has less load. But people now want to see the Status like in transportation when bus is arriving on particular stand in real time. Or in Retail as soon they update there status the require real time advertisements. This is shaping move towards Big data.
  4. 4. Problems differentiated by 5V. Velocity: With large volume of data received and quick turn around latency required to reflect the data fed at facebook then Can it be managed by regular DBMS? DBMS- maintains ACID properties & have lots of constraints like primary, foreign keys, check constraints etc.. with quick turnaround or short latency required these constraints add up processing time and volume required for storage. So all of these sites have there own File based storage DBMS like systems with does not have these constraints. All data is maintained in files, id assigned to files are indexed and regularly moved (these are publically know open sourced databases like Cassandra developed by facebook, BigTable by Google, etc…) Most of this databases are popularly Categorized as NoSQL databases.
  5. 5. Technology Company Open Sourced On Cassandra DataStax Apache Cassandra used by Facebook , Linkedin , Twitter BigTable Google Google BigTable Apache HBase Apache HBase ( used by many companies most popular) MongoDB MongoDB Inc. Apache (written on C++,Erlang,C) Couchbase CouchBase Inc Apache (written on Erlang)
  6. 6. Category No SQL database Column Oriented Accumulo, Cassandra, Hbase. Document Clusterpoint,Couchdb, Couchbase, MarkLogic, MongoDB Key-Value Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c- treeACE Graph Allegro, Neo4J, OrientDB, Virtuoso, Stardog - Column Oriented DB store database store Values in Column By Column rather in other RDBMS row by row. - It leads to better Compression Of data and hence less space required to store DB. - There are Still higher Compression can be achieved when used Probabilistic Databases. - Similarly Document oriented Store and arrange data in form of documents. - Key-Value store Data in form of collection of Key-value pairs. Allowing add, insert, delete to key-value pairs. - Graph Databases: Every Element is direct pointer to its adjacent hence no- lookup required.
  7. 7. Go through the link below: http://sandyclassic.wordpress.com/2013/07/02/data- warehousing-business-intelligence-and-cloud- computing
  8. 8. As we know now Bigdata is solving problems of 5V like the huge (V)olume of storage required for video sites like youtube. Etc. It’s changing how We perceive and Visualize or Analyze data like HBase used for data storage, Mahout of used to run analytics and find patterns. These databases have variety of data which require different kind of processing cannot be achieved by traditional RDBMS based products. Example link below: http://sandyclassic.wordpress.com/2013/06/18/gini- coefficient-of-economics-and-roc-curve-machine- learning/
  9. 9. Map-Reduce Algorithm was starting point of All we see in BigData created by Google researcher. Mapper divides work into multiple parallel task, sorts within queue and filters into queue of say 1 queue for each name. Reducer Component Aggregates data or summarizes from multiple units.
  10. 10. So Since data is mostly unstructured the best way to analyze unstructured data is using Analytics here Comes New Career Called Data Scientist. Skill Set Required for Data Scientist: Mathematics (mostly statistics), Computer Science, Domain like Sociology (like Social Media Analysis),
  11. 11. One application of Bigdata has been to gather feedback about product from social media. Here is Sample project Report below How and what tools can be used to Analyze social media. http://www.slideshare.net/SandeepSharma65/social- media-analysis-project
  12. 12. Hadoop allows to distribute load among many clusters. There can be Database clusters, OS clusters, Application Web server level clustering But here we are dealing with OS like Distributed File System(DFS). Hadoop DFS (HDFS) File system developed by yahoo Competes with BigTable of Google providing quick storage and retrieval of data in form of files used by many social media platforms.
  13. 13. ‘R’ was open source Statistical Analysis language having Statistical Constructs available used for Analysis of data. Java data mining API, .Net data mining API , python libraries are used to mine and understand trends in Data. PIG is another Apache Hadoop based system used provide High level language for analyzing large data sets.
  14. 14. Data Science http://thedatascience.wordpress.com/ Big Data :http://thebigdatatrends.wordpress.com Data Science Blog2: http://thedatascientistview.blogspot.ie/
  15. 15. Retail generates huge amount of data for product positioned on different shelf at store, replenishment level, reorder level, merchandising, assortment planning all this data most of it usually structured Since lots of system is Automated but there are lots of forms, customer feedback, planning data analysis of mails other chat platforms. Large Warehouses of Retail store needs plan positioning and containers in Aisle. Analyze trends from social media to find customer preferences for products and offers. Retail Innovation read: http://sandyclassic.wordpress.com/2013/10/26/retail- sector-innovations/
  16. 16. Retail uses lots of Sensors for tracking items with warehouse and inside Store. The Huge real time data (video , text and other forms) generated every milli- second from Sensors embedded across every store and warehouse Cannot be analyzed by any other medium better than in Hadoop or Bigdata based System.
  17. 17. Finance being Game of numbers huge data from Book of accounts, P&L, Balance sheets of etc accumulates of different business over a period of time But most books are Structured and hence the data. But Hadoop offers huge scalable clusters to quickly analyze structured data as well. Lots of social media data about interest for share or any instrument does get reflected in numbers. Spreadsheets are popular medium of analysis and other textual forms can be better analyzed if available over Hadoop like clusters for a kind of semi-structured data analysis.

×