SlideShare a Scribd company logo
1 of 34
Hadoop and IoT 
Darko Marjanović 
Đorđe Stepanić 
Miloš Milovanović
AGENDA 
BIG DATA 
HADOOP AND IOT MODEL 
HADOOP 
IOT 
HADOOP DATA PROCESSING 
HIVE 
STINGER INITIATIVE 
Q&A
BIG DATA 
Big Data describes the collection of complex and large data sets such that it’s 
difficult to capture, process, store, search and analyze using conventional data 
base systems. 
Anything that Won't Fit in Excel. 
*Definition taken from (www.bigdata-startups.com)
BIG DATA DIMESIONS 
1992 100GB/Day 
2002 100GB/Second 
2013 28,000GB/Second 
2018 50,000GB/Second
HADOOP AND IOT
HADOOP 
Apache Hadoop is an open-source software framework for storage and large-scale 
processing of data-sets on clusters of commodity hardware. 
Hadoop was created by Doug Cutting and Mike Cafarella in 2005 
All the modules in Hadoop are designed with a fundamental assumption that 
hardware failures are common and thus should be automatically handled in software 
by the framework.
HADOOP COMPONENTS 
Hadoop common 
HDFS 
Map Reduce 
YARN (Starting with Hadoop 2.x.x)
HADOOP HDFS 
The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system 
written in Java for the Hadoop framework.
HADOOP MAP REDUCE 
Map Reduce is a programming model and an associated implementation for processing 
and generating large data sets with a parallel, distributed algorithm on a cluster.
HADOOP YARN 
Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management 
technology. YARN is now characterized as a large-scale, distributed operating 
system for big data applications.
HADOOP ECOSYSTEM 
The main groups of tools in the Hadoop ecosystem: 
Data Ingestion (Flume, Sqoop …) 
Data Processing (Pig, Hive, Storm …) 
Cluster Management(Ambari) 
Security (Knox)
DATA INGESTION 
Flume 
Flume is a distributed, reliable, and available service for efficiently collecting, 
aggregating, and moving large amounts of streaming event data. 
Sqoop 
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache 
Hadoop and structured datastores such as relational databases. 
WEB HDFS REST API
FLUME EXAMPLE
SQOOP AND WEB HDFS API EXAMPLE
IOT
UBIQUITOUS COMPUTING & INTERNET OF THINGS 
Ubiquitous computing - trend (wave) in computing where computers are 
spreaded throughout our everyday environment. 
Concept: one person - many computers 
Internet Of Things - is the network of physical objects accessed through the 
Internet, which contains embedded technology to interact (sense and 
communicate) with internal states or the external environment 
(Cisco definition).
INTERNET OF THINGS COMPONENTS
INTERNET OF THINGS AND BIG DATA
REAL-TIME DATA, STRUCTURED AND UNSTRUCTURED DATA GENERATED FROM INTERNET OF THINGS
INTERNET OF THINGS - FIELDS OF APPLICATION 
* Production - energy savings, lower maintenance costs, prediction of 
machine failure, quality control etc. 
** Logistic - efficient supply control , optimization of transport, 
environmental controls in the warehouse, JIT, lean logistics, better capacity 
utilization etc. 
Smart cities & environment - smart parking, traffic congestion, smart 
lighting, waste management, noise urban maps, air pollution etc. 
Smart agriculture 
eHealth 
and everything you can imagine...
HADOOP DATA PROCESSING 
Input: 
- Raw data files 
- No metadata 
- No schema 
Objective: 
- Perform analysis, run interactive queries 
- Explore, structure and analyze the data 
- Real-time processing (Apache Storm) 
- Visualization
HIVE 
Apache Hive is a data warehousing software that facilitates querying and 
managing large datasets residing in distributed storage. 
Hive provides: 
- Tools ETL processes 
- A mechanism for imposing a structure on a variety of data formats 
- Access to files stored in HDFS or other storage systems 
- Query execution via MapReduce?
HIVE ARCHITECTURE 
Data Model: 
- Tables 
- Partitions 
- Buckets 
SERDEs 
Datatypes: 
Common primitive data types (int, 
boolean, float, double, string, char, date, 
timestamp, …) 
+Complex data types (structs, maps, 
arrays) 
UI 
Driver 
Compiler 
Metastore 
Execution 
engine
HIVE.NOW 
Hive defines a simple SQL-like query language, called HQL, that enables users 
familiar with SQL to query the data. 
Scalable and extensible. 
Most commonly used for: 
- Log analysis 
- Statistical analysis 
- Document indexing
HIVE SCRIPT EXAMPLE
STINGER INITIATIVE 
Stinger is the initiative to improve query execution time and increase SQL 
functionality for Apache Hive. Microsoft and Hortonworks worked actively in the 
Apache community towards completing Stinger. 
Announced in February 2013 
44 companies, 145 developers, 392,000 lines of Java code 
Hive 0.13 
Speed: Hive on Tez, vectorized query engine & cost-based optimizer 
Scale: dynamic partition loads and smaller hash tables 
SQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN 
Improved Hive performance up to 100x.
STINGER.NEXT 
Stinger.next is a continuation of Stinger initiative to further speed, scale and SQL in 
Hive in the open Apache Hive community. 
Main goals: 
- transactions with ACID semantics 
- sub-second queries 
- SQL:2011 Analytics 
- usability improvements 
To be delivered in next 18 months.
HIVE ON SPARK 
Apache Spark is a fast and general engine for large-scale data processing. 
Spark powers a stack of high-level tools including Spark SQL, MLlib for machine 
learning, GraphX, and Spark Streaming. 
Hive-Spark Machine Learning Integration will allow Hive users to run machine 
learning models via Hive.
STINGER.NEXT 
*Photo taken from the official Hortonworks website (www.hortonworks.com)
Q&A 
darko@thingsolver.com 
djordje@thingsolver.com 
milosmilovanovic@outlook.com 
hadoop-srbija.com
Please rate this lecture 
and win Windows Phone NOKIA Lumia 1320 
Help us choose the best Sinergija lecturer! 
Microsoft will award you – at the conference end, 
we’ll give one NOKIA Lumia 1320 to someone 
from the audience – randomly. 
Go to www.mssinergija.net, log in and cast your 
votes! 
You can rate only lectures that you were present 
at, just once. More lectures you rate, more 
chances you have. 
Winner will be announced at the official Sinergija 
web portal, www.mssinergija.net
Hadoop and IoT Sinergija 2014

More Related Content

What's hot

Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...DataWorks Summit/Hadoop Summit
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseAsis Mohanty
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data scienceAndy Petrella
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 

What's hot (20)

Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 

Viewers also liked

少女のメガホン-マニュアル①キャンペーン作成
少女のメガホン-マニュアル①キャンペーン作成少女のメガホン-マニュアル①キャンペーン作成
少女のメガホン-マニュアル①キャンペーン作成chucklellc
 
Portfolio and Unique Skill Sets
Portfolio and Unique Skill SetsPortfolio and Unique Skill Sets
Portfolio and Unique Skill SetsNicole Burkholder
 
Siyer-i Nebi 26. Sayısı
Siyer-i Nebi 26. SayısıSiyer-i Nebi 26. Sayısı
Siyer-i Nebi 26. Sayısısiyerinebi
 
Readying Robots For War - CBS News
Readying Robots For War - CBS NewsReadying Robots For War - CBS News
Readying Robots For War - CBS Newssharirodrigues13
 
Keynote - Sales-Velocity-2014-The Blowfish Effect
Keynote - Sales-Velocity-2014-The Blowfish EffectKeynote - Sales-Velocity-2014-The Blowfish Effect
Keynote - Sales-Velocity-2014-The Blowfish EffectSelligy
 
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав КовальPG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Ковальpgdayrussia
 
m04motherboard 140305083930-phpapp01
m04motherboard 140305083930-phpapp01m04motherboard 140305083930-phpapp01
m04motherboard 140305083930-phpapp01Monikasilvia Gultom
 
Sustainable economy series: part 1 – slavery in the shrimp industry
Sustainable economy series: part 1 – slavery in the shrimp industrySustainable economy series: part 1 – slavery in the shrimp industry
Sustainable economy series: part 1 – slavery in the shrimp industryAbrham Millar
 
Betty Crocker - What's her problem?
Betty Crocker - What's her problem?Betty Crocker - What's her problem?
Betty Crocker - What's her problem?Viola Crellin
 

Viewers also liked (13)

少女のメガホン-マニュアル①キャンペーン作成
少女のメガホン-マニュアル①キャンペーン作成少女のメガホン-マニュアル①キャンペーン作成
少女のメガホン-マニュアル①キャンペーン作成
 
Pairs of anglles & transversal
Pairs of anglles & transversalPairs of anglles & transversal
Pairs of anglles & transversal
 
Boom boom 9
Boom boom 9Boom boom 9
Boom boom 9
 
Portfolio and Unique Skill Sets
Portfolio and Unique Skill SetsPortfolio and Unique Skill Sets
Portfolio and Unique Skill Sets
 
Siyer-i Nebi 26. Sayısı
Siyer-i Nebi 26. SayısıSiyer-i Nebi 26. Sayısı
Siyer-i Nebi 26. Sayısı
 
Readying Robots For War - CBS News
Readying Robots For War - CBS NewsReadying Robots For War - CBS News
Readying Robots For War - CBS News
 
Keynote - Sales-Velocity-2014-The Blowfish Effect
Keynote - Sales-Velocity-2014-The Blowfish EffectKeynote - Sales-Velocity-2014-The Blowfish Effect
Keynote - Sales-Velocity-2014-The Blowfish Effect
 
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав КовальPG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
 
m04motherboard 140305083930-phpapp01
m04motherboard 140305083930-phpapp01m04motherboard 140305083930-phpapp01
m04motherboard 140305083930-phpapp01
 
Sustainable economy series: part 1 – slavery in the shrimp industry
Sustainable economy series: part 1 – slavery in the shrimp industrySustainable economy series: part 1 – slavery in the shrimp industry
Sustainable economy series: part 1 – slavery in the shrimp industry
 
Grün de Vera,Rosa Mabel- Diapositiva 1
Grün de Vera,Rosa Mabel- Diapositiva 1Grün de Vera,Rosa Mabel- Diapositiva 1
Grün de Vera,Rosa Mabel- Diapositiva 1
 
Video beyond YouTube
Video beyond YouTubeVideo beyond YouTube
Video beyond YouTube
 
Betty Crocker - What's her problem?
Betty Crocker - What's her problem?Betty Crocker - What's her problem?
Betty Crocker - What's her problem?
 

Similar to Hadoop and IoT Sinergija 2014

Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsightEng Teong Cheah
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 

Similar to Hadoop and IoT Sinergija 2014 (20)

Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsight
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 

Recently uploaded

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 

Recently uploaded (20)

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 

Hadoop and IoT Sinergija 2014

  • 1.
  • 2. Hadoop and IoT Darko Marjanović Đorđe Stepanić Miloš Milovanović
  • 3. AGENDA BIG DATA HADOOP AND IOT MODEL HADOOP IOT HADOOP DATA PROCESSING HIVE STINGER INITIATIVE Q&A
  • 4. BIG DATA Big Data describes the collection of complex and large data sets such that it’s difficult to capture, process, store, search and analyze using conventional data base systems. Anything that Won't Fit in Excel. *Definition taken from (www.bigdata-startups.com)
  • 5. BIG DATA DIMESIONS 1992 100GB/Day 2002 100GB/Second 2013 28,000GB/Second 2018 50,000GB/Second
  • 7. HADOOP Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop was created by Doug Cutting and Mike Cafarella in 2005 All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and thus should be automatically handled in software by the framework.
  • 8. HADOOP COMPONENTS Hadoop common HDFS Map Reduce YARN (Starting with Hadoop 2.x.x)
  • 9. HADOOP HDFS The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system written in Java for the Hadoop framework.
  • 10. HADOOP MAP REDUCE Map Reduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
  • 11. HADOOP YARN Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. YARN is now characterized as a large-scale, distributed operating system for big data applications.
  • 12. HADOOP ECOSYSTEM The main groups of tools in the Hadoop ecosystem: Data Ingestion (Flume, Sqoop …) Data Processing (Pig, Hive, Storm …) Cluster Management(Ambari) Security (Knox)
  • 13. DATA INGESTION Flume Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data. Sqoop Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. WEB HDFS REST API
  • 15.
  • 16. SQOOP AND WEB HDFS API EXAMPLE
  • 17. IOT
  • 18. UBIQUITOUS COMPUTING & INTERNET OF THINGS Ubiquitous computing - trend (wave) in computing where computers are spreaded throughout our everyday environment. Concept: one person - many computers Internet Of Things - is the network of physical objects accessed through the Internet, which contains embedded technology to interact (sense and communicate) with internal states or the external environment (Cisco definition).
  • 19. INTERNET OF THINGS COMPONENTS
  • 20. INTERNET OF THINGS AND BIG DATA
  • 21. REAL-TIME DATA, STRUCTURED AND UNSTRUCTURED DATA GENERATED FROM INTERNET OF THINGS
  • 22. INTERNET OF THINGS - FIELDS OF APPLICATION * Production - energy savings, lower maintenance costs, prediction of machine failure, quality control etc. ** Logistic - efficient supply control , optimization of transport, environmental controls in the warehouse, JIT, lean logistics, better capacity utilization etc. Smart cities & environment - smart parking, traffic congestion, smart lighting, waste management, noise urban maps, air pollution etc. Smart agriculture eHealth and everything you can imagine...
  • 23. HADOOP DATA PROCESSING Input: - Raw data files - No metadata - No schema Objective: - Perform analysis, run interactive queries - Explore, structure and analyze the data - Real-time processing (Apache Storm) - Visualization
  • 24. HIVE Apache Hive is a data warehousing software that facilitates querying and managing large datasets residing in distributed storage. Hive provides: - Tools ETL processes - A mechanism for imposing a structure on a variety of data formats - Access to files stored in HDFS or other storage systems - Query execution via MapReduce?
  • 25. HIVE ARCHITECTURE Data Model: - Tables - Partitions - Buckets SERDEs Datatypes: Common primitive data types (int, boolean, float, double, string, char, date, timestamp, …) +Complex data types (structs, maps, arrays) UI Driver Compiler Metastore Execution engine
  • 26. HIVE.NOW Hive defines a simple SQL-like query language, called HQL, that enables users familiar with SQL to query the data. Scalable and extensible. Most commonly used for: - Log analysis - Statistical analysis - Document indexing
  • 28. STINGER INITIATIVE Stinger is the initiative to improve query execution time and increase SQL functionality for Apache Hive. Microsoft and Hortonworks worked actively in the Apache community towards completing Stinger. Announced in February 2013 44 companies, 145 developers, 392,000 lines of Java code Hive 0.13 Speed: Hive on Tez, vectorized query engine & cost-based optimizer Scale: dynamic partition loads and smaller hash tables SQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN Improved Hive performance up to 100x.
  • 29. STINGER.NEXT Stinger.next is a continuation of Stinger initiative to further speed, scale and SQL in Hive in the open Apache Hive community. Main goals: - transactions with ACID semantics - sub-second queries - SQL:2011 Analytics - usability improvements To be delivered in next 18 months.
  • 30. HIVE ON SPARK Apache Spark is a fast and general engine for large-scale data processing. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming. Hive-Spark Machine Learning Integration will allow Hive users to run machine learning models via Hive.
  • 31. STINGER.NEXT *Photo taken from the official Hortonworks website (www.hortonworks.com)
  • 32. Q&A darko@thingsolver.com djordje@thingsolver.com milosmilovanovic@outlook.com hadoop-srbija.com
  • 33. Please rate this lecture and win Windows Phone NOKIA Lumia 1320 Help us choose the best Sinergija lecturer! Microsoft will award you – at the conference end, we’ll give one NOKIA Lumia 1320 to someone from the audience – randomly. Go to www.mssinergija.net, log in and cast your votes! You can rate only lectures that you were present at, just once. More lectures you rate, more chances you have. Winner will be announced at the official Sinergija web portal, www.mssinergija.net

Editor's Notes

  1. Agenda
  2. Microsoft and Hortonworks have a shared vision of open innovation in and around Apache Hadoop and a commitment to deliver that via a 100% open source platform.