SlideShare a Scribd company logo
Overview of Modern Software Ecosystem
for Big Data Analysis
Michael Bryzek
mbryzek@alum.mit.edu / @mbryzek
Co-Founder and Chairman Flow Commerce
Co-Founder and ex-CTO Gilt Groupe
Trillion Sensors Summit - Dec 9 2015
Overview of modern practices related to software architecture for high volume
big data applications
Encourage reuse of infrastructure that has already been built so you can
focus on analysis and information
Goals
Representational State Transfer (REST)
a uniform connector interface
● Resources - “nouns”
● Clear set of limited methods
● Standard (e.g. authorization)
Cost of integration of nth
service approaches 0
Roy Thomas Fielding’s Dissertation - https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf
examples
Stripe
Twilio
Github
Frameworks for REST
API first - the most critical design element
● http://apidoc.me *
● http://swagger.io
● http://apiary.io
Some companies (incl. Amazon) focus on API
and care very little about the implementation.
* my personal open source project
JSON - The “fat-free” alternative to XML
Javascript Object Notation (JSON)
It’s just javascript - the most widely adopted
programming language in the world
Pros Cons
Simple
Readable
Dense
Still verbose
No strong typing
CPU overhead
Binary Protocols - Ideal for sensor data
Key Features
● Language to describe schema
● Space efficient
● Fast serialization / deserialization
Leading Protocols
● Protocol Buffers https://developers.google.com/protocol-buffers
● Avro https://avro.apache.org/ - tight integration with Hadoop
● Thrift https://thrift.apache.org/
Example: Avro Schema Definition
Data processing and analytics
From https://aws.amazon.com/iot/
Data Platforms
● https://aws.amazon.com/iot/ - Amazon Kinesis, S3, Redshift, IOT -
● http://influxdata.com -open source time series database + analytics platform*
● http://confluent.io - data pipeline / real time processing built by Jay Kreps
● http://spark.apache.org/ - UC Berkeley / Cloudera led effort
Currently seeing high activity and investment in both open source and commercial
ventures.
* I am an investor in influx
Summary and Recommendation
Learning from history of evolution of software on internet…
● Define standards for interconnectivity (ala REST)
○ Avoid standards for data types (e.g. ECG)
● Choose simplicity as number one requirement
○ Avoid XML
● Adopt existing binary protocols, w/ code generation at boundaries
○ Avoid creating new protocols focused on last 5-10% improvement
● Adopt existing messaging / storage platforms for large data sets
Keeping up to date: https://www.thoughtworks.com/radar
Thank you
Michael Bryzek
mbryzek@alum.mit.edu / @mbryzek
Co-Founder and Chairman Flow Commerce
Co-Founder and ex-CTO Gilt
Trillion Sensors Summit - Dec 9 2015

More Related Content

What's hot

Case Study: How did we reduce the build time to one fifth?
Case Study: How did we reduce the build time to one fifth?Case Study: How did we reduce the build time to one fifth?
Case Study: How did we reduce the build time to one fifth?
Péter Takács
 
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Rodolfo Finochietti
 
Azure Functions
Azure FunctionsAzure Functions
Azure Functions
Rodolfo Finochietti
 
Logic Apps – Deployments
Logic Apps – DeploymentsLogic Apps – Deployments
Logic Apps – Deployments
BizTalk360
 
Blazing fast sites using Blaze, Hybrid CMS NYC
Blazing fast sites using Blaze, Hybrid CMS NYCBlazing fast sites using Blaze, Hybrid CMS NYC
Blazing fast sites using Blaze, Hybrid CMS NYC
Jesus Manuel Olivas
 
O365Con18 - Working with PowerShell, VS Code and GitHub - Thomas Vochten
O365Con18 - Working with PowerShell, VS Code and GitHub - Thomas VochtenO365Con18 - Working with PowerShell, VS Code and GitHub - Thomas Vochten
O365Con18 - Working with PowerShell, VS Code and GitHub - Thomas Vochten
NCCOMMS
 
London React August - GraphQL at The Financial Times - Viktor Charypar
London React August - GraphQL at The Financial Times - Viktor CharyparLondon React August - GraphQL at The Financial Times - Viktor Charypar
London React August - GraphQL at The Financial Times - Viktor Charypar
React London Community
 
Create a modern(ish) BAM portal in (roughly) one hour!
Create a modern(ish) BAM portal in (roughly) one hour!Create a modern(ish) BAM portal in (roughly) one hour!
Create a modern(ish) BAM portal in (roughly) one hour!
BizTalk360
 
C#: Past, Present and Future
C#: Past, Present and FutureC#: Past, Present and Future
C#: Past, Present and Future
Rodolfo Finochietti
 
Taking Control of your Data with GraphQL
Taking Control of your Data with GraphQLTaking Control of your Data with GraphQL
Taking Control of your Data with GraphQL
Vinci Rufus
 
Lessons learned: Choosing your documentation system
Lessons learned: Choosing your documentation systemLessons learned: Choosing your documentation system
Lessons learned: Choosing your documentation system
Pronovix
 
Advancing Your API Strategy in an Infrastructure World
Advancing Your API Strategy in an Infrastructure WorldAdvancing Your API Strategy in an Infrastructure World
Advancing Your API Strategy in an Infrastructure World
Pronovix
 
Introduction to Grails
Introduction to GrailsIntroduction to Grails
Introduction to Grails
Hiten Pratap Singh
 
CI/CD with Bitbucket pipelines
CI/CD with Bitbucket pipelinesCI/CD with Bitbucket pipelines
CI/CD with Bitbucket pipelines
Theophilus Omoregbee
 
GraphQL in an Age of REST
GraphQL in an Age of RESTGraphQL in an Age of REST
GraphQL in an Age of REST
Yos Riady
 
Getting started with mobile application development
Getting started with mobile application developmentGetting started with mobile application development
Getting started with mobile application development
ColdFusionConference
 
Azure Integration in Production with Logic Apps and more
Azure Integration in Production with Logic Apps and moreAzure Integration in Production with Logic Apps and more
Azure Integration in Production with Logic Apps and more
BizTalk360
 
Azure Web Jobs
Azure Web JobsAzure Web Jobs
Azure Web Jobs
BizTalk360
 
Net developer days presentation
Net developer days   presentationNet developer days   presentation
Net developer days presentation
Alexandre Malavasi
 

What's hot (20)

Case Study: How did we reduce the build time to one fifth?
Case Study: How did we reduce the build time to one fifth?Case Study: How did we reduce the build time to one fifth?
Case Study: How did we reduce the build time to one fifth?
 
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
 
Azure Functions
Azure FunctionsAzure Functions
Azure Functions
 
Logic Apps – Deployments
Logic Apps – DeploymentsLogic Apps – Deployments
Logic Apps – Deployments
 
Blazing fast sites using Blaze, Hybrid CMS NYC
Blazing fast sites using Blaze, Hybrid CMS NYCBlazing fast sites using Blaze, Hybrid CMS NYC
Blazing fast sites using Blaze, Hybrid CMS NYC
 
O365Con18 - Working with PowerShell, VS Code and GitHub - Thomas Vochten
O365Con18 - Working with PowerShell, VS Code and GitHub - Thomas VochtenO365Con18 - Working with PowerShell, VS Code and GitHub - Thomas Vochten
O365Con18 - Working with PowerShell, VS Code and GitHub - Thomas Vochten
 
London React August - GraphQL at The Financial Times - Viktor Charypar
London React August - GraphQL at The Financial Times - Viktor CharyparLondon React August - GraphQL at The Financial Times - Viktor Charypar
London React August - GraphQL at The Financial Times - Viktor Charypar
 
Create a modern(ish) BAM portal in (roughly) one hour!
Create a modern(ish) BAM portal in (roughly) one hour!Create a modern(ish) BAM portal in (roughly) one hour!
Create a modern(ish) BAM portal in (roughly) one hour!
 
C#: Past, Present and Future
C#: Past, Present and FutureC#: Past, Present and Future
C#: Past, Present and Future
 
Tce automation-d4
Tce automation-d4Tce automation-d4
Tce automation-d4
 
Taking Control of your Data with GraphQL
Taking Control of your Data with GraphQLTaking Control of your Data with GraphQL
Taking Control of your Data with GraphQL
 
Lessons learned: Choosing your documentation system
Lessons learned: Choosing your documentation systemLessons learned: Choosing your documentation system
Lessons learned: Choosing your documentation system
 
Advancing Your API Strategy in an Infrastructure World
Advancing Your API Strategy in an Infrastructure WorldAdvancing Your API Strategy in an Infrastructure World
Advancing Your API Strategy in an Infrastructure World
 
Introduction to Grails
Introduction to GrailsIntroduction to Grails
Introduction to Grails
 
CI/CD with Bitbucket pipelines
CI/CD with Bitbucket pipelinesCI/CD with Bitbucket pipelines
CI/CD with Bitbucket pipelines
 
GraphQL in an Age of REST
GraphQL in an Age of RESTGraphQL in an Age of REST
GraphQL in an Age of REST
 
Getting started with mobile application development
Getting started with mobile application developmentGetting started with mobile application development
Getting started with mobile application development
 
Azure Integration in Production with Logic Apps and more
Azure Integration in Production with Logic Apps and moreAzure Integration in Production with Logic Apps and more
Azure Integration in Production with Logic Apps and more
 
Azure Web Jobs
Azure Web JobsAzure Web Jobs
Azure Web Jobs
 
Net developer days presentation
Net developer days   presentationNet developer days   presentation
Net developer days presentation
 

Similar to Overview of modern software ecosystem for big data analysis

DevOps-Roadmap
DevOps-RoadmapDevOps-Roadmap
DevOps-Roadmap
BnhNguynHuy1
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
MakoLab SA
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
sopekmir
 
Better integrations through open interfaces
Better integrations through open interfacesBetter integrations through open interfaces
Better integrations through open interfacesSteve Speicher
 
Sharepoint 2010 architecture, ha and dr (tig)
Sharepoint 2010 architecture, ha and dr (tig)Sharepoint 2010 architecture, ha and dr (tig)
Sharepoint 2010 architecture, ha and dr (tig)Tihomir Ignatov
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Slim Baltagi
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
mattthemathman
 
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Sergio Fernández
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
Massimiliano Di Penta
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
Srijan Technologies
 
I want to be a Data DJ!
I want to be a Data DJ!I want to be a Data DJ!
I want to be a Data DJ!
Paul Groth
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
IWMW 2002: Web standards briefing (session C2)
IWMW 2002: Web standards briefing (session C2)IWMW 2002: Web standards briefing (session C2)
IWMW 2002: Web standards briefing (session C2)
IWMW
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
Roy Kim
 
Cloud Native Development
Cloud Native DevelopmentCloud Native Development
Cloud Native Development
Manuel Garcia
 
Deploying and Managing Artificial Intelligence Services using the Open Data H...
Deploying and Managing Artificial Intelligence Services using the Open Data H...Deploying and Managing Artificial Intelligence Services using the Open Data H...
Deploying and Managing Artificial Intelligence Services using the Open Data H...
Orgad Kimchi
 

Similar to Overview of modern software ecosystem for big data analysis (20)

DevOps-Roadmap
DevOps-RoadmapDevOps-Roadmap
DevOps-Roadmap
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
Better integrations through open interfaces
Better integrations through open interfacesBetter integrations through open interfaces
Better integrations through open interfaces
 
Sharepoint 2010 architecture, ha and dr (tig)
Sharepoint 2010 architecture, ha and dr (tig)Sharepoint 2010 architecture, ha and dr (tig)
Sharepoint 2010 architecture, ha and dr (tig)
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
 
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
I want to be a Data DJ!
I want to be a Data DJ!I want to be a Data DJ!
I want to be a Data DJ!
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
IWMW 2002: Web standards briefing (session C2)
IWMW 2002: Web standards briefing (session C2)IWMW 2002: Web standards briefing (session C2)
IWMW 2002: Web standards briefing (session C2)
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
Cloud Native Development
Cloud Native DevelopmentCloud Native Development
Cloud Native Development
 
Deploying and Managing Artificial Intelligence Services using the Open Data H...
Deploying and Managing Artificial Intelligence Services using the Open Data H...Deploying and Managing Artificial Intelligence Services using the Open Data H...
Deploying and Managing Artificial Intelligence Services using the Open Data H...
 

Recently uploaded

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Overview of modern software ecosystem for big data analysis

  • 1. Overview of Modern Software Ecosystem for Big Data Analysis Michael Bryzek mbryzek@alum.mit.edu / @mbryzek Co-Founder and Chairman Flow Commerce Co-Founder and ex-CTO Gilt Groupe Trillion Sensors Summit - Dec 9 2015
  • 2. Overview of modern practices related to software architecture for high volume big data applications Encourage reuse of infrastructure that has already been built so you can focus on analysis and information Goals
  • 3. Representational State Transfer (REST) a uniform connector interface ● Resources - “nouns” ● Clear set of limited methods ● Standard (e.g. authorization) Cost of integration of nth service approaches 0 Roy Thomas Fielding’s Dissertation - https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf examples Stripe Twilio Github
  • 4. Frameworks for REST API first - the most critical design element ● http://apidoc.me * ● http://swagger.io ● http://apiary.io Some companies (incl. Amazon) focus on API and care very little about the implementation. * my personal open source project
  • 5. JSON - The “fat-free” alternative to XML
  • 6. Javascript Object Notation (JSON) It’s just javascript - the most widely adopted programming language in the world Pros Cons Simple Readable Dense Still verbose No strong typing CPU overhead
  • 7.
  • 8. Binary Protocols - Ideal for sensor data Key Features ● Language to describe schema ● Space efficient ● Fast serialization / deserialization Leading Protocols ● Protocol Buffers https://developers.google.com/protocol-buffers ● Avro https://avro.apache.org/ - tight integration with Hadoop ● Thrift https://thrift.apache.org/
  • 9. Example: Avro Schema Definition
  • 10. Data processing and analytics From https://aws.amazon.com/iot/
  • 11. Data Platforms ● https://aws.amazon.com/iot/ - Amazon Kinesis, S3, Redshift, IOT - ● http://influxdata.com -open source time series database + analytics platform* ● http://confluent.io - data pipeline / real time processing built by Jay Kreps ● http://spark.apache.org/ - UC Berkeley / Cloudera led effort Currently seeing high activity and investment in both open source and commercial ventures. * I am an investor in influx
  • 12. Summary and Recommendation Learning from history of evolution of software on internet… ● Define standards for interconnectivity (ala REST) ○ Avoid standards for data types (e.g. ECG) ● Choose simplicity as number one requirement ○ Avoid XML ● Adopt existing binary protocols, w/ code generation at boundaries ○ Avoid creating new protocols focused on last 5-10% improvement ● Adopt existing messaging / storage platforms for large data sets Keeping up to date: https://www.thoughtworks.com/radar
  • 13. Thank you Michael Bryzek mbryzek@alum.mit.edu / @mbryzek Co-Founder and Chairman Flow Commerce Co-Founder and ex-CTO Gilt Trillion Sensors Summit - Dec 9 2015