RubiX ID - Big Data - Ruben Middeljans, Stephan Vos

•Download as PPTX, PDF•

0 likes•252 views

Presentatie van Ruben Middeljans en Stephan Vos over Big Data op 6 april 2016 tijdens het RubiX ID. Wordt lid van onze SPIN community op http://www.meetup.com/spi-nl/ of kijk op www.rubix.nl. Deze presentatie is ook op video beschikbaar: https://www.youtube.com/watch?v=VgMwjuq4JME

Technology

Too Big to Ignore
Joshua Moesa
Rubén Middeljans
Stephan Vos
www.rubix.nl

Agenda
• Introduction
• Facts
• Definition
• Origins
• Five V’s
• Demonstration “The Visual Face of Hadoop”
• Questions?
24-5-2016

Introduction
24-5-2016
“To turn data into the right information and put it in the
right context to gain actionable insight and prediction.”

Facts
24-5-2016
Every day,
2,500,000,
000,000,
000,000 bytes

Facts
24-5-2016
Every minute,
200 million 3 million likes
250.000 photos
500 hours
of video
3 million search
queries
420.000 tweets

Definition
Big Data is a field dedicated to the processing, storage and
analysis of large collections of data that frequently originate from
disparate sources.
24-5-2016
AnalysisProcessing Storage

Origins
Big Data emerged from a combination of business needs
and technology innovations.
24-5-2016
Analytics &
Data Science
Digitization Affordable
Technology &
Commodity
Hardware
Social Media Hyper-Connected
Communities &
Devices
Cloud
Computing

Five V’s
The characteristics that differentiate data categorized as
“Big” data are commonly known as the “Five V’s”.
24-5-2016
VelocityVolume Variety Veracity Value

Viewers also liked

Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Georg Rehm

Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis

Hardware Startups: The VC PerspectiveMatt Turck

Apt OpinionAptOpinion

Update resumeMohamed Taha

ManojManoj Singh

Reflexion sobre las Ticlina calderon

Cancer Research CasestudyDavid Lister

Viewers also liked (8)

Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...

Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...

Hardware Startups: The VC Perspective

Apt Opinion

Update resume

Manoj

Reflexion sobre las Tic

Cancer Research Casestudy

Similar to RubiX ID - Big Data - Ruben Middeljans, Stephan Vos

Data Culture Series - Keynote & Panel - 19h May - LondonJonathan Woodward

Big Data Ecosystem for Data-Driven Decision MakingAbzetdin Adamov

Big Data in Action – Real-World Solution ShowcaseInside Analysis

Büyük Veriyle Büyük Resmi Görmekideaport

The Rise of People AnalyticsHuman Capital Media

Using hadoop for big dataData Science Thailand

Data Culture Series - Keynote - 24th febJonathan Woodward

Rethink Analytics with an Enterprise Data HubCloudera, Inc.

Data Science OverviewDavide Mauri

You're the New CDO, Now What?Caserta

Open sourcebiSulochana Dasari

II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...Dr. Haxel Consult

Health Check: Maintaining Enterprise BIEric Kavanagh

Data Discovery vs BI WebinarBirst

Ch1-Introduction to Business Intelligence.pptxsommaikhantong

M365VM - Project Cortex: AI Powered Knowledge Network for the EnterpriseJoel Oleson

Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy

2013.05 - IASSIST 2013Dr.-Ing. Thomas Hartmann

Big Data Management: What's New, What's Different, and What You Need To KnowSnapLogic

Cloudian 451-hortonworks - webinarHortonworks

Similar to RubiX ID - Big Data - Ruben Middeljans, Stephan Vos (20)

Data Culture Series - Keynote & Panel - 19h May - London

Big Data Ecosystem for Data-Driven Decision Making

Big Data in Action – Real-World Solution Showcase

Büyük Veriyle Büyük Resmi Görmek

The Rise of People Analytics

Using hadoop for big data

Data Culture Series - Keynote - 24th feb

Rethink Analytics with an Enterprise Data Hub

Data Science Overview

You're the New CDO, Now What?

Open sourcebi

II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...

Health Check: Maintaining Enterprise BI

Data Discovery vs BI Webinar

Ch1-Introduction to Business Intelligence.pptx

M365VM - Project Cortex: AI Powered Knowledge Network for the Enterprise

Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...

2013.05 - IASSIST 2013

Big Data Management: What's New, What's Different, and What You Need To Know

Cloudian 451-hortonworks - webinar

Recently uploaded

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Developing An App To Navigate The Roads of BrazilV3cube

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

How to convert PDF to text with Nanonetsnaman860154

Slack Application Development 101 Slidespraypatel2

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Recently uploaded (20)

GenCyber Cyber Security Day Presentation

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Salesforce Community Group Quito, Salesforce 101

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Injustice - Developers Among Us (SciFiDevCon 2024)

Axa Assurance Maroc - Insurer Innovation Award 2024

Developing An App To Navigate The Roads of Brazil

The Codex of Business Writing Software for Real-World Solutions 2.pptx

How to convert PDF to text with Nanonets

Slack Application Development 101 Slides

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

A Domino Admins Adventures (Engage 2024)

Handwritten Text Recognition for manuscripts and early printed texts

Unblocking The Main Thread Solving ANRs and Frozen Frames

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Automating Google Workspace (GWS) & more with Apps Script

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

RubiX ID - Big Data - Ruben Middeljans, Stephan Vos

1. Too Big to Ignore Joshua Moesa Rubén Middeljans Stephan Vos www.rubix.nl

2. Agenda • Introduction • Facts • Definition • Origins • Five V’s • Demonstration “The Visual Face of Hadoop” • Questions? 24-5-2016

3. Introduction 24-5-2016 “To turn data into the right information and put it in the right context to gain actionable insight and prediction.”

4. Facts 24-5-2016 Every day, 2,500,000, 000,000, 000,000 bytes

5. Facts 24-5-2016 Every 2 minutes,

6. Facts 24-5-2016 Every minute, 200 million 3 million likes 250.000 photos 500 hours of video 3 million search queries 420.000 tweets

7. Definition Big Data is a field dedicated to the processing, storage and analysis of large collections of data that frequently originate from disparate sources. 24-5-2016 AnalysisProcessing Storage

8. Origins Big Data emerged from a combination of business needs and technology innovations. 24-5-2016 Analytics & Data Science Digitization Affordable Technology & Commodity Hardware Social Media Hyper-Connected Communities & Devices Cloud Computing

9. Five V’s The characteristics that differentiate data categorized as “Big” data are commonly known as the “Five V’s”. 24-5-2016 VelocityVolume Variety Veracity Value

10. “The Visual Face of Hadoop” 24-5-2016

11. BDD Architecture 24-5-2016

12. Demo 24-5-2016

13. Questions? 24-5-2016

Editor's Notes

1. “To turn data into the right information and put it in the right context to gain actionable insight and prediction. Dit is eigenlijk de essentie achter Big Data. Maar ook de visie achter data integratie. We zien de wereld veranderen naar een data-intensieve samenleving, iedereen is tegenwoordig nagenoeg continu verbonden met het internet en heeft de hele wereld letterlijk binnen handbereik, auto’s zijn verbonden en plannen zelf een onderhoudsbeurt in, maar ook de koelkast en wasmachine is always online, meters worden op afstand uitgelezen en sensoren publiceren continu data. 2. De grenzen tussen de fysieke en digitale wereld vervagen, een nieuw tijdperk is aangebroken waarin alles en iedereen continu met elkaar in verbinding staat. Het gaat hier om een explosie aan data, de opkomst van mobiliteit, Internet Of Things (IoT), sociale netwerken en cloud computing. We hebben continu data om ons heen op elk moment, alles is data. Er is zelfs data over data, of ook wel meta data genoemd. En het is alles behalve statisch, het wordt alsmaar groter, dynamischer en ongestructureerder. 3. De termen “Big Data” en “Data Science” zijn in de afgelopen jaren verheven tot buzzword. Deze introductie gaat dan ook vooral over het begrip over wat nu concreet de fundamentele aspecten zijn van Big Data. Hoe kunnen we Big Data opbreken in tastbare brokken. 4. Eén van de termen die ik gedurende mijn verdieping in Big Data tegenkwam was de term “real world situational awareness”. Dit vertaald zich naar het begrip, perceptie en bewustwording van alles dat om ons heen gebeurd. Dit geldt zowel voor fysieke tastbare zaken als informatie, maar ook patronen en relaties die hierin onverwachts ontstaan. Dit wordt gezien als het ultieme streven van de wetenschap Big Data. Kanttekening hierbij is dat we als individu of organisatie niet altijd weten hoe dit eruit ziet, of überhaupt kunnen bedenken hoe dit eruit zou moeten zien. Dit zal naar mijn mening ook de grootste uitdaging worden in de adoptie van Big Data, het beseft dat je iets niet weet. [OPTIONEEL] 4. Volgens diverse prognose zijn er in het jaar 2020 maar liefst 4 miljard mensen verbonden zijn met het internet. Daarnaast zijn er nog 25+ miljard apparaten en intelligente systemen verbonden die gezamenlijk maar liefst 50 biljoen GB data produceren, dit is 50 Zettabyte! -> (Nu 3 miljard verbonden mensen van 7.4 miljard, 235 miljoen dollar omzet, 4 miljoen apps, 5 miljard verbonden apparaten en 10 biljoen GB data.
Elke dag, Creëren we 2,500,000,000,000,000,000 (2,5 quintillion) bytes aan Data. Hierin zit onder andere de data van de curiosity rover op de planeet mars tot aan jouw laatste vakantiefoto’s op Facebook. Indien we deze data op DVD’s zetten en op elkaar stapelen komen we tot de maan en terug! 80% van de data is ongestructureerd. Bedrijven en organisaties gebruiken daarin minder dan 10% van de data die ze beschikbaar hebben, en zelfs minder dan 1% van de totale data waar ze in potentie over kunnen beschikken.
Dit is zoveel data, Dat we elke 2 minutes net zoveel data creëren als alle data die we ooit gecreërd hebben als mensheid tot aan het jaar 2000, en dat elke 2 minuten. Sterker nog, meer dan 90% van de data in de wereld is gecreërd in de laatste 18 maanden.
Elke minuut, Versturen we 200 miljoen e-mails waarvan 67% spam. Genereren we 3 miljoen Facebook likes and uploaden we meer dan 250.000 fotos naar Facebook. Versturen we meer dan 420.000 tweets naar Twitter. Wordt er 500 uur aan video geupload naar YouTube. Het zou je 75 jaar kosten om alle video’s te bekijken die op één dag geupload worden. Verwerkt Google een gemiddelde van 40.000 zoekopdrachten. Dit maakt 3.5 miljard zoekopdrachten per dag. Hiermee staat Google absoluut op nummer 1.
1. Big Data is geen kant-en-klare oplossing (geen kwestie van Hadoop inzetten), het is een gebied of discipline dat zich toelegt op de verwerking, opslag en analyse van grote hoeveelheden data. Deze data is vaak afkomstig uit verschillende bronnen en kan diverse kenmerken hebben. 2. Big Data oplossingen zijn benodigd op het moment dat de traditionele verwerking, opslag en analyse technologieën ontoereikend zijn vanwege complexiteit of andere kenmerken van de data. In het bijzonder richt het zich op bijzondere requirements zoals het combineren van meerdere ongestructureerde datasets, het verwerken van grote hoeveelheden ongestructureerde data en het oogsten van verborgen gegevens, patronen en relaties binnen een bepaald tijdsbestek. 3. Voorbeelden Amazon is zo goed geworden in Big Data analytics (predictive variant) dat ze in staat zijn producten naar je toe te sturen alvorens je deze besteld hebt. Walmart weet bijvoorbeeld dat er meer pop tarts verkocht worden bij een stormwaarschuwing. Ze weten niet waarom, maar zorgen wel dat ze voldoende voorraad hebben en de snacks een goede plek in de winkel krijgen. New York als slimme stad is momenteel bezig met sound scaping, een verstoring van het typische stadsgeluid, zoals een pistoolschot, wordt direct doorgegeven aan de politie die er gelijk op af kunnen.
1. Analytics & Data Science Bedrijven en organisaties slaan meer data op om potentieel nieuwe inzichten te verkrijgen en daarmee een voorsprong te verkrijgen op de concurrentie. De behoefte aan techniek en technologie die hiermee gepaard gaan zijn significant toegenomen. Een voorbeeld is predictive (what-if) en prescriptive analytics op grote hoeveelheden ongestructureerde data sets. Predictive analytics: Indien een klant product A en B gekocht heeft, wat is de kans dat deze ook product C koopt? (kansberekening). Prescriptive analytics: Uit een keuze van product A, B en C, welke product zal volgende maand het beste verkopen indien we de prijs verlagen? 2. Digitization Voor veel bedrijven en organisaties hebben digitale media fysieke mediums vervangen als de facto communicatie. Gedigitaliseerde data geeft organisaties de kans om over belangrijke en nuttige informatie te beschikken. 3. Affordable Technology & Commodity Hardware Technologie met betrekking tot het de verwerking en opslag van grote hoeveelheden data is steeds betaalbaarder geworden. Het gebruik van commodity hardware maakt de adoptie van big data oplossingen toegankelijk voor bedrijven zonder een initiële grote investering te doen. 4. Social Media De opkomst van sociale media heeft klanten in staat gesteld om in near-realtime feedback te geven via open en publieke media. Deze ontwikkeling heeft bedrijven en organisaties in staat gesteld om hun strategie aan te passen op basis van de feedback. 5. Hyper-Connected Communities & Devices De brede dekking van het internet en de verspreiding van smartphones en WiFi netwerken hebben meer mensen in staat gesteld om continu actief te zijn in virtuele gemeenschappen, hetzij direct door middel van online interactie of door het gebruik van aangesloten apparaten zoals televisies, RFID, koelkasten en slimme meters. 6. Cloud Computing Ontwikkelingen op het gebied van cloud technologie hebben geleid tot het ontstaan van cloud omgevingen. Hiermee is het mogelijk om zeer schaalbare, on-demand uitvoeringsomgevingen te huren middels een zeer interessante kostenmodel (pay-as-you-go).
De kenmerken die “Big” data onderscheiden zijn ook wel bekend als de “Vijf V’s”. 1. Volume (volume)Volume van de data dat verwerkt wordt door big data oplossingen. 2. Velocity (snelheid)De snelheid waarmee gegevens arriveren en mogelijk accumuleren binnen korte tijd. 3. Variety (variatie)Variatie verwijst naar de meerdere soorten en bijbehorende formaten van data dat verwerkt wordt door big data oplossingen. Deze gegevensverzamelingen kunnen zowel door de mens of computers gegenereerd zijn en kunnen een gestructureerd, semi-gestructureerd of ongestructureerde data bevatten. 4. Veracity (kwaliteit)Veracity is simpelweg de kwaliteit van de data, hoe zinvol is deze data. Binnen big data onderkennen we ruis of signaal data. Ruis data heeft aboluut geen waarde voor de business en signaal data bevat juist zinvolle informatie. 5. Value (waarde)Value is de waarde van de data voor de business, er is een directe relatie met de veracity oftewel kwaliteit van de data. Hoe zinvoller de informatie hoe te hoger de waarde is voor de business. Waarde is tevens tijdsafhankelijk en hangt daarom ook af van de tijd die benodigd is voor het verwerken van de data tot informatie. Hoe langer het duurt voor de verwerking hoe lager de waarde is voor de business, simpelweg omdat de houdbaarheid van de informatie op dat moment mogelijk is verstreken.
The Visual Face of Hadoop… Why? There is a lot of (unstructured) data in your data lake and it is nog that easy to get value from all this data. Combine this with data scientist struggling 80% of their time to prepare the data they want to analyse and only 20% actually analyse this data. That’s why Oracle came up with this layer on top of Hadoop which makes it possible to visualise data quickly Now data scientist can use 20% of their time preparing the data and 80% to actually analyse the data.
Oracle Big Data Discovery offers true technical innovation on Hadoop, natively leveraging the power of distributed storage and computing, across servers (or nodes), to process massive amounts of information without having to move it around. Studio is the unique web-based user interface that makes it easy for anyone to find, explore, transform, discover and share data. Dgraph server is the hybrid search-analytic database that allows users to operate on in-memory data sets for interactive performance. Data processing uses Apache Spark to profile, sample, transform and enrich massive amounts of information across all the data nodes in the Hadoop cluster. Big Data Discovery is a core component in Oracle’s overall Big Data management and analytics strategy, enabling customers to: Use Oracle R Advanced Analytics for Hadoop, for better predictive analytics Leverage Oracle Big Data SQL to query the data in HDFS without moving it at all Implement solutions on Oracle engineered systems, enabling rapid application deployment, optimized performance benefits and lower total cost of ownership
Find relevant data quickly Explore the data in your data lake Transform and enrich the data to make it more useful Discover new insights in the data Share the new insights with teams Let’s go and do a demo!

RubiX ID - Big Data - Ruben Middeljans, Stephan Vos

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to RubiX ID - Big Data - Ruben Middeljans, Stephan Vos

Similar to RubiX ID - Big Data - Ruben Middeljans, Stephan Vos (20)

More from RubiX BV

More from RubiX BV (10)

Recently uploaded

Recently uploaded (20)

RubiX ID - Big Data - Ruben Middeljans, Stephan Vos

Editor's Notes