SlideShare a Scribd company logo
1 of 32
Download to read offline
Big Data Open Source Tools and Trends: Enable Real-
Time Business Intelligence from Machine Logs
Eric Roch, Principal &
Ben Hahn, Senior Technical Architect
Perficient is a leading information technology consulting firm serving clients throughout
North America.
We help clients implement business-driven technology solutions that integrate business
processes, improve worker productivity, increase customer loyalty and create a more agile
enterprise to better respond to new business opportunities.
About Perficient
• Founded in 1997
• Public, NASDAQ: PRFT
• 2013 revenue $373 million
• Major market locations throughout North America
• Atlanta, Boston, Charlotte, Chicago, Cincinnati, Columbus,
Dallas, Denver, Detroit, Fairfax, Houston, Indianapolis,
Los Angeles, Minneapolis, New Orleans, New York City,
Northern California, Philadelphia, Southern California,
St. Louis, Toronto and Washington, D.C.
• Global delivery centers in China, Europe and India
• >2,100 colleagues
• Dedicated solution practices
• ~90% repeat business rate
• Alliance partnerships with major technology vendors
• Multiple vendor/industry technology and growth awards
Perficient Profile
BUSINESS SOLUTIONS
Business Intelligence
Business Process Management
Customer Experience and CRM
Enterprise Performance Management
Enterprise Resource Planning
Experience Design (XD)
Management Consulting
TECHNOLOGY SOLUTIONS
Business Integration/SOA
Cloud Services
Commerce
Content Management
Custom Application Development
Education
Information Management
Mobile Platforms
Platform Integration
Portal & Social
Our Solutions Expertise
Eric Roch
Principal
Eric leads Perficient's
national connected solutions
practice
• Includes focus on SOA/integration,
cloud, mobile and Big Data
• Author & industry speaker
• 25 years+ of experience in various
aspects of information technology
including:
• Executive-level management
• Enterprise architecture
• Application development
Speakers
Ben Hahn
Sr. Technical Architect
Ben Hahn is a Sr.
Technical Architect
• Includes focus on transactions, logging &
exceptions processing
• Author & speaker
• 20+ years of experience in various
aspects of information technology
including:
• Software solutions
• Enterprise infrastructure
• Product management
• Open Source software community
contributor
• Often defined as data that exceeds the capacities of
conventional database systems because it’s too large
and moves too fast for traditional database systems to
handle in an architecturally cohesive way. The three V’s
of Big Data are:
• Volume
• Most companies have 100 TB of data
• Facebook ingests 500 TB in a single day
• 40 ZettaBytes (that’s 43 trillion GB) of data by
2020
• Velocity
• NYSE captures 4-5 TB of data in a single day
• A Boeing 737 generates 243 TB in a single flight
• The Google self-driving car generates 750MB of
data per second!
• Variety
• Twitter, Clickstreams, Audio, Video
• GPS, Sensor data, Facebook content
• Infrastructure and application logs
What is Big Data?
POLL QUESTION:
What is your current adoption level for big data?
• Evaluation
• Prototype
• Production
But Not Everyone is Google!
Where’s the Big Data coming from?
POLL QUESTION
Have you used open source software for big data solutions?
• Yes
• No
Machine Data definitely has the three V’s of Big Data
Machine Data is Big Data
What Can We Gain From Machine Data?
Valuable information can be mined from
machine data, including:
• Transaction monitoring
• Error detection
• Behavior trends
• Audit logging
• Infrastructure states
• Anomaly detection
• Geospatial analysis
• Network analysis
Log Analysis vs. Business Analytics
• Ingest - Versus ETL
• Big Data - Bidirectional integration with Hadoop
• Query language - MapReduce function on unstructured
data
• Drill anywhere - Investigate on all the data versus a
predefined schema or cube
• Information discovery - Discover relationships based on
patterns in the data
• Ad-hoc versus dimensional - Log analysis is not based a
predefined structure based a point-in-time set of
requirements
• Explicit logging - Versus implicit correlation
Polling Question:
Do you mine machine data for business 
insights?
• Yes
• No
Innovations From Cloud and OSS
• Hadoop and MapReduce - Derived from Google's
MapReduce and Google File System
• Storm – Distributed event processor open sourced by
Twitter
• Presto - Facebook has released as open source a SQL
query engine built to work with petabyte-sized data
warehouses
• Google BigQuery - Run SQL-like queries against terabytes
of data in seconds
• Amazon DynamoDB - NoSQL database service to store
and retrieve any amount of data, and serve any level of
request traffic
• Elasticsearch – Distributed full-text search OSS community
POLLING QUESTION
Do you plan to use cloud based solutions for 
big data?
• Yes
• No
• 2004 - Google published a paper called MapReduce: Simplified Data
Processing on Large Clusters characterized by:
• Map and shuffle key-values data pairs and then aggregate/reduce these
intermediate data pairs
• Origins in map and reduce primitives in functional languages
• Massive parallelism and elasticity via commodity hardware
• Fault tolerance via master-worker nodes
Big Data Processing: MapReduce
2
• Based on Lambda (λ) calculus
• ALL computational functions and data can be expressed as
a series of functions and predicates of functions
• Declarative language rather than imperative
• First-order functions – Functions can be passed just like
values as arguments and returned as arguments. This also
allows currying and partial functions.
• Call by name – Function expressions are not evaluated
until they are actually used.
• Recursion – Functions evaluate to itself potentially in an
endless loop.
• Immutable state and values – Pure functional programming
does not consider variables but rather immutable values as
they appear in any moment in time. This has big effects on
scalability and concurrency.
• Referential Transparency - Functions can be replaced by
their values with no side effects.
• Pattern matching – Data type matching as well as data
structure composition and deep object type matching
• Erlang, Haskell, Lisp, Clojure, Scala
What are functional languages?
And MapReduce is Better with
Functional Languages
2
Imperative Model: Pascal, C. Basic, etc.
Evolution (or Devolution?) of Databases
2
Object Oriented Programming Model: Java,
C++,C#.
Evolution (or Devolution?) of Databases
2
Functional Programming Model:
Scala, Clojure, F#
Evolution (or Devolution?) of Databases
2
• Because commodity hardware in the cloud is infinitely
elastic, resource needs to query and run transactions
can be scaled in response to the data volumes at the
store level.
• Data is stored using functional programming concept of
immutability by only appending data as point-in-time
values.
• MapReduce functions can be balanced and distributed
across machines as nodes fail or new nodes are added.
• First-class functions and call by name allows function,
lambda expressions to be passed into MapReduce calls
as arguments allowing ad-hoc functionality to be added.
• Pattern matching allows very complex pattern matches
on complex structures like XML.
• Transactions use functional expressions like compare
and swap operations to ensure ACIDity.
• SQL or query expressions can be reduced to
MapReduce functions or lambda expressions and/or
patterns and distributed in parallel across the nodes.
• Using recursion, complex structures like XML can be
mapped and reduced from a single expression.
MapReduce Machine Data:
What Do We Need?
• A dynamic process for parsing
and mapping unstructured data
to structured data in real-time
• Wide range of data formats
(text, XML, JSON, CSV, EDI,
etc.)
• Need intelligent pattern
matching capabilities
• Ability to correlate meaningful
transactional data and metrics
from disparate data (reducing)
• Machine data is static and
immutable. Append-only fast
writes with eventual
consistency is ideal
• Need fast filter, search, query
capabilities to display results
Open Source Big Data Landscape
Source: www.bigdata‐startup.com
Apache Hadoop: The Elephant in the Room
• What about Apache Hadoop?
• Apache Hadoop comprises HDFS and the 
Hadoop MapReduce both based on Google’s GFS 
and MapReduce
• Batch oriented MapReduce jobs through 
Schedulers and JobTrackers
• Require real‐time MapReduce processes
• Need index, query, search on data in real‐time 
with a well‐defined interface
• We can use for secondary storage of long‐term 
persistent logs – Lambda Architecture (Batch vs
Speed Layer)
Apache Storm: Use Real-time
MapReduce for Machine Data Streams
• Developed by Backtype and acquired by Twitter
• Distributed computational framework that allows real-
time MapReduce functionality from any data source
streams using concept of Spouts and Bolts
• Read From Any Data Stream using Spouts (Kafka,
JMS, HTTP, etc.)
• Transactional and guaranteed message processing
• Parallelism and scalability
• Fault Tolerance (Master-Worker for MapReduce)
• MapReduce Topologies
• Offers Real-time MapReduce jobs (Or Bolts)
• Other tools: Apache Spark
Apache Storm: Use Real-time
MapReduce for Machine Data Streams
MapReduce - Declarative and simplicity of functional languages within
Storm
Elasticsearch: Distributed
Document Search
• Distributed search server engine using Apache Lucene
• It’s a Schema-less document store using JSON as it’s
document format. New fields can be added dynamically.
All fields are indexed by default
• Uses index shards to distribute queries and searches
across clusters. Queries and searches are run in parallel
• Cluster can host multiple indexes and can be queried as
a group or singly. Index aliases allows indexes to be
added or dropped dynamically
• Append-only model using versioning. Writes very fast
depending on wait model (wait for all shards to be written
or a quorom or none)
• Well-defined RESTful API interface. Very powerful query
features
• Other tools: Apache Solr
Elasticsearch: Distributed
Document Search
Elasticsearch: Distributed Query and searches using index shards and replicas
A Really Cool UI to Show This Off
• Kibana – Works seamlessly with Elasticsearch, queries Elasticsearch
directly from Javascript
• Everything is user driven, very little coding except some configuration
settings in yaml
• Very dynamic screen interface
• Screen layout, queries, filters, graphs, histograms are saved directly to
Elasticsearch
• Great design and user interface
Putting It In Action: Demo
As a reminder, please submit your
questions in the chat box
We will get to as many as possible!
4/1/2014
Daily unique content
about content
management, user
experience, portals
and other enterprise
information technology
solutions across a
variety of industries.
Perficient.com/SocialMedia
Facebook.com/Perficient
Twitter.com/Perficient
Thank you for your participation today.
Please fill out the survey at the close of this session.
4/1/2014

More Related Content

What's hot

How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationPerficient, Inc.
 
Leverage Customer Data to Deliver a Personalized Digital Experience
Leverage Customer Data to Deliver a Personalized Digital ExperienceLeverage Customer Data to Deliver a Personalized Digital Experience
Leverage Customer Data to Deliver a Personalized Digital ExperiencePerficient, Inc.
 
Drive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event ProcessingDrive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event ProcessingPerficient, Inc.
 
Collaboration Excellence: Strategies for Enabling a Social Business
Collaboration Excellence: Strategies for Enabling a Social BusinessCollaboration Excellence: Strategies for Enabling a Social Business
Collaboration Excellence: Strategies for Enabling a Social BusinessPerficient, Inc.
 
Making the Most of Power BI with SQL Server 2014 and Azure
Making the Most of Power BI with SQL Server 2014 and AzureMaking the Most of Power BI with SQL Server 2014 and Azure
Making the Most of Power BI with SQL Server 2014 and AzurePerficient, Inc.
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Five Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyFive Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyPerficient, Inc.
 
Présentation Forrester - Forum MDM Micropole 2014
Présentation Forrester - Forum MDM Micropole 2014Présentation Forrester - Forum MDM Micropole 2014
Présentation Forrester - Forum MDM Micropole 2014Micropole Group
 
Lower Total Cost of Care and Gain Valuable Patient Insights through Predictiv...
Lower Total Cost of Care and Gain Valuable Patient Insights through Predictiv...Lower Total Cost of Care and Gain Valuable Patient Insights through Predictiv...
Lower Total Cost of Care and Gain Valuable Patient Insights through Predictiv...Perficient, Inc.
 
Google Search for Life Sciences Companies
Google Search for Life Sciences CompaniesGoogle Search for Life Sciences Companies
Google Search for Life Sciences CompaniesPerficient, Inc.
 
MDM for Customer data with Talend
MDM for Customer data with Talend MDM for Customer data with Talend
MDM for Customer data with Talend Jean-Michel Franco
 
Sharepoint 2013 Hybrid Scenarios That Make Sense: Optimize Your SharePoint & ...
Sharepoint 2013 Hybrid Scenarios That Make Sense: Optimize Your SharePoint & ...Sharepoint 2013 Hybrid Scenarios That Make Sense: Optimize Your SharePoint & ...
Sharepoint 2013 Hybrid Scenarios That Make Sense: Optimize Your SharePoint & ...Perficient, Inc.
 
What You Need to Know Before Upgrading to SharePoint 2013
What You Need to Know Before Upgrading to SharePoint 2013What You Need to Know Before Upgrading to SharePoint 2013
What You Need to Know Before Upgrading to SharePoint 2013Perficient, Inc.
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Reimagine Your Business in a Digital-First World with Microsoft
Reimagine Your Business in a Digital-First World with MicrosoftReimagine Your Business in a Digital-First World with Microsoft
Reimagine Your Business in a Digital-First World with MicrosoftPerficient, Inc.
 
How PIH Is Using Office 365 to Improve Global Collaboration
How PIH Is Using Office 365 to Improve Global CollaborationHow PIH Is Using Office 365 to Improve Global Collaboration
How PIH Is Using Office 365 to Improve Global CollaborationPerficient, Inc.
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lakeCapgemini
 
Implementing Digital Signatures in an FDA-Regulated Environment
Implementing Digital Signatures in an FDA-Regulated EnvironmentImplementing Digital Signatures in an FDA-Regulated Environment
Implementing Digital Signatures in an FDA-Regulated EnvironmentPerficient, Inc.
 
How Watson and BPM are Transforming Insurance
How Watson and BPM are Transforming InsuranceHow Watson and BPM are Transforming Insurance
How Watson and BPM are Transforming InsurancePerficient, Inc.
 

What's hot (20)

How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data Visualization
 
Leverage Customer Data to Deliver a Personalized Digital Experience
Leverage Customer Data to Deliver a Personalized Digital ExperienceLeverage Customer Data to Deliver a Personalized Digital Experience
Leverage Customer Data to Deliver a Personalized Digital Experience
 
Drive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event ProcessingDrive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event Processing
 
Collaboration Excellence: Strategies for Enabling a Social Business
Collaboration Excellence: Strategies for Enabling a Social BusinessCollaboration Excellence: Strategies for Enabling a Social Business
Collaboration Excellence: Strategies for Enabling a Social Business
 
Making the Most of Power BI with SQL Server 2014 and Azure
Making the Most of Power BI with SQL Server 2014 and AzureMaking the Most of Power BI with SQL Server 2014 and Azure
Making the Most of Power BI with SQL Server 2014 and Azure
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Five Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyFive Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data Strategy
 
Présentation Forrester - Forum MDM Micropole 2014
Présentation Forrester - Forum MDM Micropole 2014Présentation Forrester - Forum MDM Micropole 2014
Présentation Forrester - Forum MDM Micropole 2014
 
Lower Total Cost of Care and Gain Valuable Patient Insights through Predictiv...
Lower Total Cost of Care and Gain Valuable Patient Insights through Predictiv...Lower Total Cost of Care and Gain Valuable Patient Insights through Predictiv...
Lower Total Cost of Care and Gain Valuable Patient Insights through Predictiv...
 
Google Search for Life Sciences Companies
Google Search for Life Sciences CompaniesGoogle Search for Life Sciences Companies
Google Search for Life Sciences Companies
 
Perficient and Oracle
Perficient and OraclePerficient and Oracle
Perficient and Oracle
 
MDM for Customer data with Talend
MDM for Customer data with Talend MDM for Customer data with Talend
MDM for Customer data with Talend
 
Sharepoint 2013 Hybrid Scenarios That Make Sense: Optimize Your SharePoint & ...
Sharepoint 2013 Hybrid Scenarios That Make Sense: Optimize Your SharePoint & ...Sharepoint 2013 Hybrid Scenarios That Make Sense: Optimize Your SharePoint & ...
Sharepoint 2013 Hybrid Scenarios That Make Sense: Optimize Your SharePoint & ...
 
What You Need to Know Before Upgrading to SharePoint 2013
What You Need to Know Before Upgrading to SharePoint 2013What You Need to Know Before Upgrading to SharePoint 2013
What You Need to Know Before Upgrading to SharePoint 2013
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Reimagine Your Business in a Digital-First World with Microsoft
Reimagine Your Business in a Digital-First World with MicrosoftReimagine Your Business in a Digital-First World with Microsoft
Reimagine Your Business in a Digital-First World with Microsoft
 
How PIH Is Using Office 365 to Improve Global Collaboration
How PIH Is Using Office 365 to Improve Global CollaborationHow PIH Is Using Office 365 to Improve Global Collaboration
How PIH Is Using Office 365 to Improve Global Collaboration
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lake
 
Implementing Digital Signatures in an FDA-Regulated Environment
Implementing Digital Signatures in an FDA-Regulated EnvironmentImplementing Digital Signatures in an FDA-Regulated Environment
Implementing Digital Signatures in an FDA-Regulated Environment
 
How Watson and BPM are Transforming Insurance
How Watson and BPM are Transforming InsuranceHow Watson and BPM are Transforming Insurance
How Watson and BPM are Transforming Insurance
 

Viewers also liked

Capgemini EIU Big Data Study
Capgemini EIU Big Data StudyCapgemini EIU Big Data Study
Capgemini EIU Big Data StudyCapgemini
 
Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesJeff Kelly
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemSlideCentral
 
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...Dr. Raju M. Mathew
 
Towards Neuro–Information Science
Towards Neuro–Information ScienceTowards Neuro–Information Science
Towards Neuro–Information Sciencejacekg
 
Big data + data science startup focus points
Big data + data science startup focus pointsBig data + data science startup focus points
Big data + data science startup focus pointsTom Zorde
 
Sharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataSharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataTERN Australia
 
Semiotics and Information Science
Semiotics and Information ScienceSemiotics and Information Science
Semiotics and Information ScienceFlorence Paisey
 
Real time data services
Real time data servicesReal time data services
Real time data servicesRelevate
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big DataInfoFarm
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 
Earley Executive Roundtable - Building a Digital Transformation Roadmap
Earley Executive Roundtable - Building a Digital Transformation RoadmapEarley Executive Roundtable - Building a Digital Transformation Roadmap
Earley Executive Roundtable - Building a Digital Transformation RoadmapEarley Information Science
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdpAIBDP
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
Conceptions of information science
Conceptions of information scienceConceptions of information science
Conceptions of information scienceJorge Prado
 
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaningJ.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaningJosé Nafría
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceJian Qin
 

Viewers also liked (20)

Capgemini EIU Big Data Study
Capgemini EIU Big Data StudyCapgemini EIU Big Data Study
Capgemini EIU Big Data Study
 
Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use cases
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
 
Towards Neuro–Information Science
Towards Neuro–Information ScienceTowards Neuro–Information Science
Towards Neuro–Information Science
 
Big data + data science startup focus points
Big data + data science startup focus pointsBig data + data science startup focus points
Big data + data science startup focus points
 
Sharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataSharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem Data
 
Semiotics and Information Science
Semiotics and Information ScienceSemiotics and Information Science
Semiotics and Information Science
 
Real time data services
Real time data servicesReal time data services
Real time data services
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Earley Executive Roundtable - Building a Digital Transformation Roadmap
Earley Executive Roundtable - Building a Digital Transformation RoadmapEarley Executive Roundtable - Building a Digital Transformation Roadmap
Earley Executive Roundtable - Building a Digital Transformation Roadmap
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdp
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
Conceptions of information science
Conceptions of information scienceConceptions of information science
Conceptions of information science
 
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaningJ.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 

Similar to Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence from Machine Logs

Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesAlice Zheng
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 

Similar to Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence from Machine Logs (20)

Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Apache drill
Apache drillApache drill
Apache drill
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 

More from Perficient, Inc.

Driving Strong 2020 Holiday Season Results
Driving Strong 2020 Holiday Season ResultsDriving Strong 2020 Holiday Season Results
Driving Strong 2020 Holiday Season ResultsPerficient, Inc.
 
Transforming Pharmacovigilance Workflows with AI & Automation
Transforming Pharmacovigilance Workflows with AI & Automation Transforming Pharmacovigilance Workflows with AI & Automation
Transforming Pharmacovigilance Workflows with AI & Automation Perficient, Inc.
 
The Secret to Acquiring and Retaining Customers in Financial Services
The Secret to Acquiring and Retaining Customers in Financial ServicesThe Secret to Acquiring and Retaining Customers in Financial Services
The Secret to Acquiring and Retaining Customers in Financial ServicesPerficient, Inc.
 
Oracle Strategic Modeling Live: Defined. Discussed. Demonstrated.
Oracle Strategic Modeling Live: Defined. Discussed. Demonstrated.Oracle Strategic Modeling Live: Defined. Discussed. Demonstrated.
Oracle Strategic Modeling Live: Defined. Discussed. Demonstrated.Perficient, Inc.
 
Content, Commerce, and... COVID
Content, Commerce, and... COVIDContent, Commerce, and... COVID
Content, Commerce, and... COVIDPerficient, Inc.
 
Centene's Financial Transformation Journey: A OneStream Success Story
Centene's Financial Transformation Journey: A OneStream Success StoryCentene's Financial Transformation Journey: A OneStream Success Story
Centene's Financial Transformation Journey: A OneStream Success StoryPerficient, Inc.
 
Automate Medical Coding With WHODrug Koda
Automate Medical Coding With WHODrug KodaAutomate Medical Coding With WHODrug Koda
Automate Medical Coding With WHODrug KodaPerficient, Inc.
 
Preparing for Your Oracle, Medidata, and Veeva CTMS Migration Project
Preparing for Your Oracle, Medidata, and Veeva CTMS Migration ProjectPreparing for Your Oracle, Medidata, and Veeva CTMS Migration Project
Preparing for Your Oracle, Medidata, and Veeva CTMS Migration ProjectPerficient, Inc.
 
Accelerating Partner Management: How Manufacturers Can Navigate Covid-19
Accelerating Partner Management: How Manufacturers Can Navigate Covid-19Accelerating Partner Management: How Manufacturers Can Navigate Covid-19
Accelerating Partner Management: How Manufacturers Can Navigate Covid-19Perficient, Inc.
 
The Critical Role of Audience Intelligence with Eric Enge and Rand Fishkin
The Critical Role of Audience Intelligence with Eric Enge and Rand FishkinThe Critical Role of Audience Intelligence with Eric Enge and Rand Fishkin
The Critical Role of Audience Intelligence with Eric Enge and Rand FishkinPerficient, Inc.
 
Cardtronics Future Ready with Oracle EPM Cloud
Cardtronics Future Ready with Oracle EPM CloudCardtronics Future Ready with Oracle EPM Cloud
Cardtronics Future Ready with Oracle EPM CloudPerficient, Inc.
 
Teams Summit - What is New and Coming
Teams Summit -  What is New and ComingTeams Summit -  What is New and Coming
Teams Summit - What is New and ComingPerficient, Inc.
 
Empower Your Organization with Teams & Remote Work Crisis Management
Empower Your Organization with Teams & Remote Work Crisis ManagementEmpower Your Organization with Teams & Remote Work Crisis Management
Empower Your Organization with Teams & Remote Work Crisis ManagementPerficient, Inc.
 
Adoption & Change Management Overview
Adoption & Change Management OverviewAdoption & Change Management Overview
Adoption & Change Management OverviewPerficient, Inc.
 
Microsoft Teams: Measuring Activity of Employees Working from Home
Microsoft Teams: Measuring Activity of Employees Working from HomeMicrosoft Teams: Measuring Activity of Employees Working from Home
Microsoft Teams: Measuring Activity of Employees Working from HomePerficient, Inc.
 
Securing Teams with Microsoft 365 Security for Remote Work
Securing Teams with Microsoft 365 Security for Remote WorkSecuring Teams with Microsoft 365 Security for Remote Work
Securing Teams with Microsoft 365 Security for Remote WorkPerficient, Inc.
 
Infrastructure Best Practices for Teams Remote Workers
Infrastructure Best Practices for Teams Remote WorkersInfrastructure Best Practices for Teams Remote Workers
Infrastructure Best Practices for Teams Remote WorkersPerficient, Inc.
 
Accelerate Adoption for Microsoft Teams
Accelerate Adoption for Microsoft TeamsAccelerate Adoption for Microsoft Teams
Accelerate Adoption for Microsoft TeamsPerficient, Inc.
 
Preparing for Project Cortex and the Future of Knowledge Management
Preparing for Project Cortex and the Future of Knowledge ManagementPreparing for Project Cortex and the Future of Knowledge Management
Preparing for Project Cortex and the Future of Knowledge ManagementPerficient, Inc.
 
Utilizing Microsoft 365 Security for Remote Work
Utilizing Microsoft 365 Security for Remote Work Utilizing Microsoft 365 Security for Remote Work
Utilizing Microsoft 365 Security for Remote Work Perficient, Inc.
 

More from Perficient, Inc. (20)

Driving Strong 2020 Holiday Season Results
Driving Strong 2020 Holiday Season ResultsDriving Strong 2020 Holiday Season Results
Driving Strong 2020 Holiday Season Results
 
Transforming Pharmacovigilance Workflows with AI & Automation
Transforming Pharmacovigilance Workflows with AI & Automation Transforming Pharmacovigilance Workflows with AI & Automation
Transforming Pharmacovigilance Workflows with AI & Automation
 
The Secret to Acquiring and Retaining Customers in Financial Services
The Secret to Acquiring and Retaining Customers in Financial ServicesThe Secret to Acquiring and Retaining Customers in Financial Services
The Secret to Acquiring and Retaining Customers in Financial Services
 
Oracle Strategic Modeling Live: Defined. Discussed. Demonstrated.
Oracle Strategic Modeling Live: Defined. Discussed. Demonstrated.Oracle Strategic Modeling Live: Defined. Discussed. Demonstrated.
Oracle Strategic Modeling Live: Defined. Discussed. Demonstrated.
 
Content, Commerce, and... COVID
Content, Commerce, and... COVIDContent, Commerce, and... COVID
Content, Commerce, and... COVID
 
Centene's Financial Transformation Journey: A OneStream Success Story
Centene's Financial Transformation Journey: A OneStream Success StoryCentene's Financial Transformation Journey: A OneStream Success Story
Centene's Financial Transformation Journey: A OneStream Success Story
 
Automate Medical Coding With WHODrug Koda
Automate Medical Coding With WHODrug KodaAutomate Medical Coding With WHODrug Koda
Automate Medical Coding With WHODrug Koda
 
Preparing for Your Oracle, Medidata, and Veeva CTMS Migration Project
Preparing for Your Oracle, Medidata, and Veeva CTMS Migration ProjectPreparing for Your Oracle, Medidata, and Veeva CTMS Migration Project
Preparing for Your Oracle, Medidata, and Veeva CTMS Migration Project
 
Accelerating Partner Management: How Manufacturers Can Navigate Covid-19
Accelerating Partner Management: How Manufacturers Can Navigate Covid-19Accelerating Partner Management: How Manufacturers Can Navigate Covid-19
Accelerating Partner Management: How Manufacturers Can Navigate Covid-19
 
The Critical Role of Audience Intelligence with Eric Enge and Rand Fishkin
The Critical Role of Audience Intelligence with Eric Enge and Rand FishkinThe Critical Role of Audience Intelligence with Eric Enge and Rand Fishkin
The Critical Role of Audience Intelligence with Eric Enge and Rand Fishkin
 
Cardtronics Future Ready with Oracle EPM Cloud
Cardtronics Future Ready with Oracle EPM CloudCardtronics Future Ready with Oracle EPM Cloud
Cardtronics Future Ready with Oracle EPM Cloud
 
Teams Summit - What is New and Coming
Teams Summit -  What is New and ComingTeams Summit -  What is New and Coming
Teams Summit - What is New and Coming
 
Empower Your Organization with Teams & Remote Work Crisis Management
Empower Your Organization with Teams & Remote Work Crisis ManagementEmpower Your Organization with Teams & Remote Work Crisis Management
Empower Your Organization with Teams & Remote Work Crisis Management
 
Adoption & Change Management Overview
Adoption & Change Management OverviewAdoption & Change Management Overview
Adoption & Change Management Overview
 
Microsoft Teams: Measuring Activity of Employees Working from Home
Microsoft Teams: Measuring Activity of Employees Working from HomeMicrosoft Teams: Measuring Activity of Employees Working from Home
Microsoft Teams: Measuring Activity of Employees Working from Home
 
Securing Teams with Microsoft 365 Security for Remote Work
Securing Teams with Microsoft 365 Security for Remote WorkSecuring Teams with Microsoft 365 Security for Remote Work
Securing Teams with Microsoft 365 Security for Remote Work
 
Infrastructure Best Practices for Teams Remote Workers
Infrastructure Best Practices for Teams Remote WorkersInfrastructure Best Practices for Teams Remote Workers
Infrastructure Best Practices for Teams Remote Workers
 
Accelerate Adoption for Microsoft Teams
Accelerate Adoption for Microsoft TeamsAccelerate Adoption for Microsoft Teams
Accelerate Adoption for Microsoft Teams
 
Preparing for Project Cortex and the Future of Knowledge Management
Preparing for Project Cortex and the Future of Knowledge ManagementPreparing for Project Cortex and the Future of Knowledge Management
Preparing for Project Cortex and the Future of Knowledge Management
 
Utilizing Microsoft 365 Security for Remote Work
Utilizing Microsoft 365 Security for Remote Work Utilizing Microsoft 365 Security for Remote Work
Utilizing Microsoft 365 Security for Remote Work
 

Recently uploaded

Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 

Recently uploaded (20)

Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence from Machine Logs

  • 1. Big Data Open Source Tools and Trends: Enable Real- Time Business Intelligence from Machine Logs Eric Roch, Principal & Ben Hahn, Senior Technical Architect
  • 2. Perficient is a leading information technology consulting firm serving clients throughout North America. We help clients implement business-driven technology solutions that integrate business processes, improve worker productivity, increase customer loyalty and create a more agile enterprise to better respond to new business opportunities. About Perficient
  • 3. • Founded in 1997 • Public, NASDAQ: PRFT • 2013 revenue $373 million • Major market locations throughout North America • Atlanta, Boston, Charlotte, Chicago, Cincinnati, Columbus, Dallas, Denver, Detroit, Fairfax, Houston, Indianapolis, Los Angeles, Minneapolis, New Orleans, New York City, Northern California, Philadelphia, Southern California, St. Louis, Toronto and Washington, D.C. • Global delivery centers in China, Europe and India • >2,100 colleagues • Dedicated solution practices • ~90% repeat business rate • Alliance partnerships with major technology vendors • Multiple vendor/industry technology and growth awards Perficient Profile
  • 4. BUSINESS SOLUTIONS Business Intelligence Business Process Management Customer Experience and CRM Enterprise Performance Management Enterprise Resource Planning Experience Design (XD) Management Consulting TECHNOLOGY SOLUTIONS Business Integration/SOA Cloud Services Commerce Content Management Custom Application Development Education Information Management Mobile Platforms Platform Integration Portal & Social Our Solutions Expertise
  • 5. Eric Roch Principal Eric leads Perficient's national connected solutions practice • Includes focus on SOA/integration, cloud, mobile and Big Data • Author & industry speaker • 25 years+ of experience in various aspects of information technology including: • Executive-level management • Enterprise architecture • Application development Speakers Ben Hahn Sr. Technical Architect Ben Hahn is a Sr. Technical Architect • Includes focus on transactions, logging & exceptions processing • Author & speaker • 20+ years of experience in various aspects of information technology including: • Software solutions • Enterprise infrastructure • Product management • Open Source software community contributor
  • 6. • Often defined as data that exceeds the capacities of conventional database systems because it’s too large and moves too fast for traditional database systems to handle in an architecturally cohesive way. The three V’s of Big Data are: • Volume • Most companies have 100 TB of data • Facebook ingests 500 TB in a single day • 40 ZettaBytes (that’s 43 trillion GB) of data by 2020 • Velocity • NYSE captures 4-5 TB of data in a single day • A Boeing 737 generates 243 TB in a single flight • The Google self-driving car generates 750MB of data per second! • Variety • Twitter, Clickstreams, Audio, Video • GPS, Sensor data, Facebook content • Infrastructure and application logs What is Big Data?
  • 7. POLL QUESTION: What is your current adoption level for big data? • Evaluation • Prototype • Production
  • 8. But Not Everyone is Google! Where’s the Big Data coming from?
  • 9. POLL QUESTION Have you used open source software for big data solutions? • Yes • No
  • 10. Machine Data definitely has the three V’s of Big Data Machine Data is Big Data
  • 11. What Can We Gain From Machine Data? Valuable information can be mined from machine data, including: • Transaction monitoring • Error detection • Behavior trends • Audit logging • Infrastructure states • Anomaly detection • Geospatial analysis • Network analysis
  • 12. Log Analysis vs. Business Analytics • Ingest - Versus ETL • Big Data - Bidirectional integration with Hadoop • Query language - MapReduce function on unstructured data • Drill anywhere - Investigate on all the data versus a predefined schema or cube • Information discovery - Discover relationships based on patterns in the data • Ad-hoc versus dimensional - Log analysis is not based a predefined structure based a point-in-time set of requirements • Explicit logging - Versus implicit correlation
  • 14. Innovations From Cloud and OSS • Hadoop and MapReduce - Derived from Google's MapReduce and Google File System • Storm – Distributed event processor open sourced by Twitter • Presto - Facebook has released as open source a SQL query engine built to work with petabyte-sized data warehouses • Google BigQuery - Run SQL-like queries against terabytes of data in seconds • Amazon DynamoDB - NoSQL database service to store and retrieve any amount of data, and serve any level of request traffic • Elasticsearch – Distributed full-text search OSS community
  • 16. • 2004 - Google published a paper called MapReduce: Simplified Data Processing on Large Clusters characterized by: • Map and shuffle key-values data pairs and then aggregate/reduce these intermediate data pairs • Origins in map and reduce primitives in functional languages • Massive parallelism and elasticity via commodity hardware • Fault tolerance via master-worker nodes Big Data Processing: MapReduce 2
  • 17. • Based on Lambda (λ) calculus • ALL computational functions and data can be expressed as a series of functions and predicates of functions • Declarative language rather than imperative • First-order functions – Functions can be passed just like values as arguments and returned as arguments. This also allows currying and partial functions. • Call by name – Function expressions are not evaluated until they are actually used. • Recursion – Functions evaluate to itself potentially in an endless loop. • Immutable state and values – Pure functional programming does not consider variables but rather immutable values as they appear in any moment in time. This has big effects on scalability and concurrency. • Referential Transparency - Functions can be replaced by their values with no side effects. • Pattern matching – Data type matching as well as data structure composition and deep object type matching • Erlang, Haskell, Lisp, Clojure, Scala What are functional languages? And MapReduce is Better with Functional Languages 2
  • 18. Imperative Model: Pascal, C. Basic, etc. Evolution (or Devolution?) of Databases 2
  • 19. Object Oriented Programming Model: Java, C++,C#. Evolution (or Devolution?) of Databases 2
  • 20. Functional Programming Model: Scala, Clojure, F# Evolution (or Devolution?) of Databases 2 • Because commodity hardware in the cloud is infinitely elastic, resource needs to query and run transactions can be scaled in response to the data volumes at the store level. • Data is stored using functional programming concept of immutability by only appending data as point-in-time values. • MapReduce functions can be balanced and distributed across machines as nodes fail or new nodes are added. • First-class functions and call by name allows function, lambda expressions to be passed into MapReduce calls as arguments allowing ad-hoc functionality to be added. • Pattern matching allows very complex pattern matches on complex structures like XML. • Transactions use functional expressions like compare and swap operations to ensure ACIDity. • SQL or query expressions can be reduced to MapReduce functions or lambda expressions and/or patterns and distributed in parallel across the nodes. • Using recursion, complex structures like XML can be mapped and reduced from a single expression.
  • 21. MapReduce Machine Data: What Do We Need? • A dynamic process for parsing and mapping unstructured data to structured data in real-time • Wide range of data formats (text, XML, JSON, CSV, EDI, etc.) • Need intelligent pattern matching capabilities • Ability to correlate meaningful transactional data and metrics from disparate data (reducing) • Machine data is static and immutable. Append-only fast writes with eventual consistency is ideal • Need fast filter, search, query capabilities to display results
  • 22. Open Source Big Data Landscape Source: www.bigdata‐startup.com
  • 23. Apache Hadoop: The Elephant in the Room • What about Apache Hadoop? • Apache Hadoop comprises HDFS and the  Hadoop MapReduce both based on Google’s GFS  and MapReduce • Batch oriented MapReduce jobs through  Schedulers and JobTrackers • Require real‐time MapReduce processes • Need index, query, search on data in real‐time  with a well‐defined interface • We can use for secondary storage of long‐term  persistent logs – Lambda Architecture (Batch vs Speed Layer)
  • 24. Apache Storm: Use Real-time MapReduce for Machine Data Streams • Developed by Backtype and acquired by Twitter • Distributed computational framework that allows real- time MapReduce functionality from any data source streams using concept of Spouts and Bolts • Read From Any Data Stream using Spouts (Kafka, JMS, HTTP, etc.) • Transactional and guaranteed message processing • Parallelism and scalability • Fault Tolerance (Master-Worker for MapReduce) • MapReduce Topologies • Offers Real-time MapReduce jobs (Or Bolts) • Other tools: Apache Spark
  • 25. Apache Storm: Use Real-time MapReduce for Machine Data Streams MapReduce - Declarative and simplicity of functional languages within Storm
  • 26. Elasticsearch: Distributed Document Search • Distributed search server engine using Apache Lucene • It’s a Schema-less document store using JSON as it’s document format. New fields can be added dynamically. All fields are indexed by default • Uses index shards to distribute queries and searches across clusters. Queries and searches are run in parallel • Cluster can host multiple indexes and can be queried as a group or singly. Index aliases allows indexes to be added or dropped dynamically • Append-only model using versioning. Writes very fast depending on wait model (wait for all shards to be written or a quorom or none) • Well-defined RESTful API interface. Very powerful query features • Other tools: Apache Solr
  • 27. Elasticsearch: Distributed Document Search Elasticsearch: Distributed Query and searches using index shards and replicas
  • 28. A Really Cool UI to Show This Off • Kibana – Works seamlessly with Elasticsearch, queries Elasticsearch directly from Javascript • Everything is user driven, very little coding except some configuration settings in yaml • Very dynamic screen interface • Screen layout, queries, filters, graphs, histograms are saved directly to Elasticsearch • Great design and user interface
  • 29. Putting It In Action: Demo
  • 30. As a reminder, please submit your questions in the chat box We will get to as many as possible! 4/1/2014
  • 31. Daily unique content about content management, user experience, portals and other enterprise information technology solutions across a variety of industries. Perficient.com/SocialMedia Facebook.com/Perficient Twitter.com/Perficient
  • 32. Thank you for your participation today. Please fill out the survey at the close of this session. 4/1/2014