• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data Analytics 2013
 

Big Data Analytics 2013

on

  • 3,619 views

How you can create and develop powerful Big Data Analytics solutions based on Open Source software

How you can create and develop powerful Big Data Analytics solutions based on Open Source software

Statistics

Views

Total Views
3,619
Views on SlideShare
3,477
Embed Views
142

Actions

Likes
15
Downloads
0
Comments
0

5 Embeds 142

http://www.stratebi.com 111
http://t.co 14
https://twitter.com 11
http://www.linkedin.com 4
http://www.techgig.com 2

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Analytics 2013 Big Data Analytics 2013 Presentation Transcript

    • BIG DATA Open Source AnalyticsBIG DATA Open Source Analytics
    • Table of Contents Stratebi Introduction Introduction to Big Data Current issues Scalability Databases Big Data history Big Data diagram Tools Hadoop Hbase Hive So… what else?
    • About us - Stratebi
    • Customers trusting in Open Source Business Intelligence Private Sector Public Sector
    • Open Source Big Data - Stratebi Understanding information…Understanding information…
    • Open Source Big Data - Stratebi Data was not stored Beginning of the use of DBs and basic reports Business Intelligence. Great variety of visual resources to analyze data
    • Open Source Big Data- Stratebi Data analysis profits: Competitive advantages Customer satisfaction evaluation Business process improvement Increase sales …
    • Open Source Big Data - Stratebi New data analysis techniques and processesNew data analysis techniques and processes New BI solutions New visual resources New data sources Cloud solutions Latest trends Social Intelligence Mailing intelligence …
    • Open Source Big Data - Stratebi Corporations and organizations noticeCorporations and organizations notice that…that…
    • Open Source Big Data - Stratebi
    • Open Source Big Data - Stratebi
    • Open Source Big Data - Stratebi Data analysis to increase performance and be faster
    • Open Source Big Data - Stratebi Lap telemetry Monaco Grand Prix (total 78 laps)
    • Open Source Big Data - Stratebi So…So… What is consideredWhat is considered Big DataBig Data??
    • Open Source Big Data - Stratebi
    • Open Source Big Data - Stratebi Big Data ArchitectureBig Data Architecture
    • Open Source Big Data - Stratebi Scalability Vertical + CPU + RAM Data types Structured Unstructured Current challengesCurrent challenges Horizontal More nodes
    • Open Source Big Data - Stratebi Unstructured Structured Data typesData types A data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently. List: http://en.wikipedia.org/wiki/List_of_data_structures Primitive data types: Boolean, chart, float, double … Unstructured information refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner.
    • Open Source Big Data - Stratebi Data read High data read cost in JOINS Massive Joins Relational model Current challengesCurrent challenges Transactional Are transactions required and consistent? Can it be represented as a relational model?
    • Open Source Big Data - Stratebi Types of Big Data DBs. Not Only SQL (NoSQL)Types of Big Data DBs. Not Only SQL (NoSQL) In response to these problems a NoSQL paradigm appeared. NoSQL is not a substitute for relational databases Instead it is used in other specific scenarios Not all problems can be solved using a RDBMS Developer has a range of possibilities and can select the best to deal with a specific problem There are several NoSQL systems focusing on typical issues (scaling, increasing performance…) in a different way
    • Open Source Big Data - Stratebi Types of Big Data DBs. Not Only SQL (NoSQL)Types of Big Data DBs. Not Only SQL (NoSQL) Key-Value data stores Columnar databases Document-oriented databases Graph databasesObject oriented databases Do not replace relational model. Specific scenarios.Do not replace relational model. Specific scenarios.
    • Open Source Big Data - Stratebi Key-Value stores Easy to use Value stored in a collection of binary data (BLOB) Content is not relevant to database, only the key and its associated value are important No schema required (columns, data types) to store information Scalability: from key X to X+100 in Server 1, from X+101 to X+200 in Server2
    • Open Source Big Data - Stratebi Document-oriented databases Key-value store with the special feature that store is not stored with a predefined format and not as a binary field.
    • Open Source Big Data - Stratebi Object oriented databases Systems in which information is represented in the form of objects Based in OID and not in primary keys Hierarchical relations can be represented Object-oriented database management systems never had the expected impact, but have several market niches such as some scientific applications
    • Open Source Big Data - Stratebi Graph databases Graph structures with nodes, edges, and properties used to represent and store data Compared with relational databases, graph databases are often faster for associative data sets Only useful if your data can be represented using a network
    • Open Source Big Data - Stratebi Columnar databases Column databases store data tables as sections of columns of data rather than as rows of data. Reduce read time Inefficient on writing operations Used in data warehouses and Business Intelligence systems Ideal for calculating indicators over aggregated data
    • Open Source Big Data - Stratebi Are these DBs?Are these DBs?
    • Open Source Big Data - Stratebi A brief historical review… First Google implementations needed multiplying huge matrices to calculate PageRanks In order to manage big data sets algorithms and frameworks capable of processing terabytes were created An early application able to carry out MapReduce data processing paradigm was implemented in Hadoop, initially designed by Doug Cutting
    • Open Source Big Data - Stratebi Software framework that supports distributed applications, licensed under the Apache v2 license. Hadoop was derived from Google's MapReduce and Google File System papers is the largest contributor to the project Written in the Java programming language Hadoop is based in a file system and is not a database About Apache HadoopAbout Apache Hadoop
    • Open Source Big Data - Stratebi About Apache HadoopAbout Apache Hadoop
    • Open Source Big Data - Stratebi Why use Hadoop?Why use Hadoop? Need to compress data Nodes fail every day Common infrastructure Efficient Easy to use Open Source
    • Open Source Big Data - Stratebi Why use Hadoop?Why use Hadoop?
    • Open Source Big Data - Stratebi Common usesCommon uses Searches Log processing Recommendation systems Analytics (Facebook, Linkedin) Image and video processing (NASA) Data retention
    • Open Source Big Data - Stratebi Hadoop ComponentsHadoop Components
    • Open Source Big Data - Stratebi HDFS file systemHDFS file system
    • Open Source Big Data - Stratebi HDFS file systemHDFS file system Hadoop Distributed File System (HDFS) is a distributed file system Each node in a Hadoop instance typically has a single data node Uses the TCP/IP layer for communication Achieves reliability by replicating the data across multiple hosts Data nodes can talk to each other to rebalance data, to move copies around, and to keep the replication of data high
    • Open Source Big Data - Stratebi MAP ReduceMAP Reduce Consists in a Job Tracker Job Tracker assigns a task to idle Task Tracker nodes in the cluster
    • Open Source Big Data - Stratebi How to do MapReduce?How to do MapReduce? Map The Map function is applied in parallel to every pair in the input dataset and produces a list of pairs for each call Map (key1, value1) –> list (key2, value2)
    • Open Source Big Data - Stratebi How to do MapReduce?How to do MapReduce? Reduce Reduce phase collects all pairs with the same key from all lists and groups them together, creating one group for each key Reduce function is then applied in parallel to each group created by Map() function and produces a collection of values in the same domain Thus the MapReduce framework converts a list of (key, value) pairs into a list of values Reduce (key2, list(value2)) –> list(value3)
    • Open Source Big Data - Stratebi MapReduceMapReduce
    • Open Source Big Data - Stratebi MapReduceMapReduce
    • Open Source Big Data - Stratebi MapReduce WordCount exampleMapReduce WordCount example
    • Open Source Big Data - Stratebi MapReduce WordCount exampleMapReduce WordCount example bin/hadoop jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>] <in-dir> <out-dir>
    • Open Source Big Data - Stratebi MapReduce WordCount exampleMapReduce WordCount example
    • Open Source Big Data - Stratebi Sounds difficultSounds difficult Are there anyAre there any tools to help us?tools to help us?
    • Open Source Big Data - Stratebi What is HBase?What is HBase? HBase is an open source distributed database modeled after Google's BigTable Hbase allows linear scaling by adding more servers to the system Runs on top of HDFS, providing BigTable-like capabilities for Hadoop HBase is written in Java
    • Open Source Big Data - Stratebi What is HBase?What is HBase? Hbase is suitable when you require high read/write speeds in a BigData infrastructure. HBase is able to store enormous tables (billions of rows and millions of columns) in a cluster composed by basic nodes Working modes
    • Open Source Big Data - Stratebi What is HBase?What is HBase? Hbase commands
    • Open Source Big Data - Stratebi What is Hive?What is Hive? Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis Provides an SQL-like language called HiveQL while maintaining full support for map/reduce Built-in user defined functions (UDFs) to manipulate dates, strings, and other data-mining tools. Hive supports extending the UDF set to handle use- cases not supported by built-in functions
    • Open Source Big Data - Stratebi I am a complete JavaI am a complete Java noob and need help…noob and need help… What can I do?What can I do?
    • Open Source Big Data - Stratebi Graphical ETL tool included in Pentaho suite Built to help in processes of Extracting, Transporting, Transforming and Loading data. Supports deployment on single node computers as well as on a cloud, or cluster. What is Kettle?What is Kettle?
    • Open Source Big Data - Stratebi • View perspective: • Database connections • Steps • Hops • Slave server • Kettle cluster schemas • Design perspective: • Inputs • Outputs • Lookups • Transform • Joins • Scripting • Data Warehouse • Mapping • Job • Inline • Experimental
    • Open Source Big Data - Stratebi Main Big Data steps in KettleMain Big Data steps in Kettle
    • Open Source Big Data - Stratebi Word Count exampleWord Count example
    • Open Source Big Data - Stratebi Word Count exampleWord Count example Configuring MapReduceConfiguring MapReduce
    • Open Source Big Data - Stratebi Word Count exampleWord Count example Configuring MapReduceConfiguring MapReduce
    • Open Source Big Data - Stratebi Word Count exampleWord Count example
    • Open Source Big Data - Stratebi Word Count exampleWord Count example
    • Open Source Big Data - Stratebi Configuring MapReduce with HbaseConfiguring MapReduce with Hbase
    • Open Source Big Data - Stratebi Configuring MapReduce with HbaseConfiguring MapReduce with Hbase
    • Open Source Big Data - Stratebi Using Hive as data sourceUsing Hive as data source
    • Open Source Big Data - Stratebi Big Data project and Business IntelligenceBig Data project and Business Intelligence
    • Open Source Big Data - Stratebi Big Data project and Business IntelligenceBig Data project and Business Intelligence
    • Open Source Big Data - Stratebi Big Data project and Business Intelligence.Big Data project and Business Intelligence. Smart City Case StudySmart City Case Study
    • Open Source Big Data - Stratebi Visualization – Social Media dashboardsVisualization – Social Media dashboards
    • Open Source Big Data - Stratebi Visualization – Operational dashboardVisualization – Operational dashboard
    • Open Source Big Data - Stratebi Visualization – Operational dashboardVisualization – Operational dashboard
    • Open Source Big Data - Stratebi Visualization- Geographic dashboardVisualization- Geographic dashboard
    • Open Source Big Data - Stratebi Visualization – Advanced charts (Treemap, Sunburst ...)Visualization – Advanced charts (Treemap, Sunburst ...)
    • Open Source Big Data - Stratebi
    • Open Source Big Data - Stratebi Stratebi is a Spanish company located in Madrid, Barcelona and with a delegation in Sao Paulo, we are a group of professionals with a wide experience in Information systems and Technologic solutions related to the field of open source software and Business Intelligence. Contact details: info@stratebi.com www.stratebi.com Phones: (+34) 917883410 - (+34) 931844325 About usAbout us