Open source for customer analytics

•

0 likes•550 views

Presented at "Customer Insight & Analytics Conference" in London 2-3 December 2015. Introductory level, 30 mins duration.

Software

Open Source for Customer
Analytics
Matthias Funke
Business & Technology Consultant

Open Source Software
Examples: Linux, LibreOffice, Eclipse, Hadoop
Source Code open, e.g. github.com (>3M users, 6.8M repos)
Governed by foundations, e.g. Apache Software Foundation,
Free Software Foundation
Contributors / committers: Academia, start-ups, corporations,
specialised OSS companies

Popular Apache Software
Projects
Project Donated by...
Cassandra Facebook (2008)
Storm Twitter (2013)
Hadoop Yahoo (2008)
Kafka LinkedIn

Apache Software Foundation
Sponsors
Google, Yahoo, Microsoft, Facebook, Citrix…
HP, IBM, Hortonworks, Cloudera, Comcast
Auto & General, Huawei, Pivotal, …
Talend, Twitter

Benefits, Drawbacks & Facts
Benefits
● No Licence Cost
● Huge amount of
knowledge in the
community
● High speed of innovation
● Funny names
Drawbacks
● Overwhelming choices
● Varying maturity
● Skills challenge (for
newer projects)
Facts of Life
● Professional Services / Support not free

“Data Products”
Core: valuable data. Tools to display and manipulate.
Good: live, visual, searchable
Types:
● Exploratory
● Internal production
● Publicly facing (but free)
● Commercial = monetised
VOLUME
VARIETY
VELOCITY
VERACITY

Popular Data Products
Google Flights (not a booking engine!)
CIA World Fact Book (simple presentation)
Inside AirBnB (“activist”)
data.gov.uk

The Data Process
1. Obtain data
2. Explore & clean data
3. Analyse & model
4. Visualise
5. Productionise & automate Data Pipeline
a. How and where to distribute?
b. How to scale?
c. How to secure?
d. How to manage day-to-day?

Using ggplot2 for exploratory graphs
qplot(host$availability_365,
+ geom="histogram",
+ binwidth = 5,
+ main = "Histogram for Availability",
+ xlab = "AirBnB in London",
+ fill=I("blue"))

Statistical Analysis
SIMPLE
● Sum, Count, Mean / Median
● Variance / Standard Deviation
E.g. Average Revenue per User per
Neighbourhood (by Month of the
Year)
MORE COMPLEX
● Clustering
● Co-variance matrix
(dependencies between
variables)
● Predictive Models
● Machine Learning

Big Data Architectures (simplified)
“Big” Database Hadoop Cluster / File System
Query Engine (Data Access)
Execution Engine (Business Logic)
Search Engine (Accessibility)
Visualisation Layer

Interactive Notebooks
New breed of software to work interactively on data
Spark/Scala Notebook
Apache Zeppelin
Databricks: cloud (proprietary but built on Spark)

hack/reduce is a community and hackspace for working with big data that provides access to a computing cluster, holds regular hackathons, and allows users to work with large datasets containing millions or billions of records using tools like Hadoop and MapReduce to find patterns and extract new information. The computing cluster has 240 cores, 240GB of RAM, and 10TB of disk space available for exploring open datasets like government documents, weather records, and transportation data.

Big data – An Introduction, July 2013

Peter Morgan

This document provides an overview of big data, including its definition, sources, databases, and analytics. It defines big data as large datasets greater than terabytes in size that are increasingly being collected from various sources such as science, social media, government and more. It notes that most data is unstructured. It also discusses the evolution of databases from relational SQL databases to non-relational NoSQL databases and Hadoop. Finally, it outlines the major tools and technologies used for big data analytics, including MapReduce, Hadoop, and machine learning.

Big data hadoop

Agnieszka Zdebiak

The document discusses big data and Hadoop. It defines big data as highly scalable integration, storage, and analysis of poly-structured data. It describes how Hadoop can be used for tasks like ads/recommendations, travel processing, mobile data processing, energy savings, infrastructure management, image processing, fraud detection, IT security, and healthcare. It also discusses NoSQL databases and Hive Query Language. Finally, it notes that big data requires new data specialists like Hadoop specialists and data scientists.

Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Zhou, MBA, PhD

Big Data - HDInsight and Power BI

Prasad Prabhu (PP)

Big Data is one of the hot topics and has got the attention of the IT industry globally. It is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. More accurate analyses may lead to more confident decision making. And better decisions can mean greater operational efficiencies, cost reductions and reduced risk. This presentation focuses on why, what, how of big data as we explore some of Microsoft's big data solutions - HDInsight azure service and PowerBI, providing insights into the world of Big data.

Introduction to hadoop

Ganesh Sanap

BigData refers to large and complex datasets that are difficult to process using traditional database management systems. It includes both structured and unstructured data from sources like social media, sensors, business transactions, and more. Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware. It solves BigData problems through massively parallel processing using its core components - HDFS for storage and MapReduce for distributed computing.

From BigTable to HBase and back again

Leonardo Gamas

Analysis of big data in pandemic case

Muh Saleh

This document discusses how big data is used in Indonesia's pandemic response. It provides an overview of big data and its implementation at the Ministry of Health to manage COVID-19 data. Large volumes of structured and unstructured data from various sources are extracted, transformed, and loaded into Hortonworks Hadoop ecosystem daily. This data is then analyzed with Hive and BigSQL, summarized, and visualized in Tableau dashboards. Lessons learned include the importance of data availability, consistency, and governance to produce insights that help decision making during the pandemic.

This document discusses big data and how enterprises are adopting big data solutions. It describes how data has exploded in terms of volume, velocity, and variety. Big data now includes structured, semi-structured, and unstructured data from sources like sensors, social media, and machine logs. The document outlines how Hadoop has become a popular big data platform that provides scalable and cost-effective storage and processing of large, complex datasets. It also discusses how enterprises are using big data for applications like predictive analytics, social intelligence, and mobile analytics to drive insights and decisions.

Big Data Visualisation with Hadoop and PowerPivot

Jen Stirrup

Graph Database and Neo4j

Sina Khorami

This document provides an overview of graph databases and Neo4j. It discusses how graph databases are better suited than relational databases for interconnected data and have simpler data models. Neo4j is highlighted as a graph database that uses nodes, edges and properties to represent data and uses the Cypher query language. It is fully ACID compliant, open source, and has a large active community.

Big data landscape

Natalino Busa

An overview about several technologies which contribute to the landscape of Big Data. An intro about the technology challenges of Big Data, follow by key open-source components which help out in dealing with various big data aspects such as OLAP, Real-Time Online Analytics, Machine Learning on Map-Reduce. I conclude with an enumeration of the key areas where those technologies are most likely unleashing new opportunity for various businesses.

Présentation on radoop

siliconsudipt

Radoop is a tool that integrates Hadoop, Hive, and Mahout capabilities into RapidMiner's user-friendly interface. It allows users to perform scalable data analysis on large datasets stored in Hadoop. Radoop addresses the growing amounts of structured and unstructured data by leveraging Hadoop's distributed file system (HDFS) and MapReduce framework. Key benefits of Radoop include its scalability for large data volumes, its graphical user interface that eliminates ETL bottlenecks, and its ability to perform machine learning and analytics on Hadoop clusters.

Hadoop Training Tutorial for Freshers

rajkamaltibacademy

A Brief History Of Data

Damien Dallimore

This document provides a brief history of data from ancient times to the present day. It discusses how humans started counting and recording data visually over 20,000 years ago. Written language emerged around 3,500 BC allowing data to be recorded and transmitted. Major developments include the first library in 1250 BC to store data in mass, the origin of maps in 1150 BC, and using numbers and logic to derive insights from 100 BC to 350 BC. Significant milestones from 1600 to 1900 include the development of statistics, computers, programming languages, data standards, and the internet. Today's "big data" landscape is characterized by volume, variety, velocity, and veracity of data being created. The future will involve understanding data through insights

Turnkey Multi-Region, Active-Active Session Stores with Steeltoe, Redis Enter...

VMware Tanzu

View on big data technologies

Krisshhna Daasaarii

Big Data is still a challenge for many companies to collect, process, and analyze large amounts of structured and unstructured data. Hadoop provides an open source framework for distributed storage and processing of large datasets across commodity servers to help companies gain insights from big data. While Hadoop is commonly used, Spark is becoming a more popular tool that can run 100 times faster for iterative jobs and integrates with SQL, machine learning, and streaming technologies. Both Hadoop and Spark often rely on the Hadoop Distributed File System for storage and are commonly implemented together in big data projects and platforms from major vendors.

Introducing the Big Data Ecosystem with Caserta Concepts & Talend

Caserta

This document summarizes a webinar presented by Talend and Caserta Concepts on the big data ecosystem. The webinar discussed how Talend provides an open source integration platform that scales to handle large data volumes and complex processes. It also overviewed Caserta Concepts' expertise in data management, big data analytics, and industries like financial services. The webinar covered topics like traditional vs big data, Hadoop and NoSQL technologies, and common integration patterns between traditional data warehouses and big data platforms.

Overview of Bigdata Analytics

Sankarapu Anjaneyulu

This document discusses big data, its key characteristics of volume, velocity, and variety, and how large amounts of diverse data are being generated from various sources like mobile devices, social media, e-commerce, and emails. It explains that big data analytics can provide competitive advantages and better business decisions by examining large datasets. Hadoop and NoSQL databases are approaches for processing and storing large datasets across distributed systems.

Seattle scalability meetup March 27,2013 intro slides

clive boulton

Big data PPT

Nitesh Dubey

This document defines key terms related to big data such as structured data, unstructured data, and semi-structured data. It discusses how data is generated from various sources and factors like sensors, social networks, and online shopping. It explains that big data refers to data that is too large to process using traditional methods due to its volume, velocity, and variety. Hadoop is introduced as an open source framework that uses HDFS for distributed storage and MapReduce for distributed processing of large data sets across computer clusters.

Hadoop - An Introduction

Shankar R

This document provides an introduction and overview of Hadoop. It discusses how businesses have been collecting large amounts of data but face challenges in analyzing it due to application complexities, data growth, infrastructure limitations, and economic factors. Hadoop is presented as a solution that can handle high-volume data, perform complex operations at scale, is robust and fault tolerant. Key components of Hadoop like HDFS, MapReduce, and the Hadoop ecosystem are described at a high level.

Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...

yashbheda

Big data is generated from various sources like users, systems, and devices. It has grown exponentially due to factors like volume, velocity, variety, and veracity. Analyzing big data helps optimize network resources, improve security monitoring, enable targeted marketing, and enhance performance evaluation. Implementing big data solutions requires strategies for data collection, analysis, storage, and visualization to extract useful insights at scale.

Bigdata

Shankar R

This document discusses big data, including how much data is now being collected, challenges with traditional database management systems, and the need for new approaches like Hadoop and Aster Data. It provides details on characteristics of big data, architectural requirements, techniques for analysis, and solutions from companies like IBM, Teradata, and Aster Data. Hadoop is discussed in depth, covering how it works, the ecosystem, and example users. Aster Data is also summarized, focusing on its massively parallel SQL layer and in-database analytics capabilities.

Big Data

Neha Mehta

Big data is characterized by 3 V's - volume, velocity, and variety. It refers to large and complex datasets that are difficult to process using traditional database management tools. Key technologies to handle big data include distributed file systems, Apache Hadoop, data-intensive computing, and tools like MapReduce. Common tools used are infrastructure management tools like Chef and Puppet, monitoring tools like Nagios and Ganglia, and analytics platforms like Netezza and Greenplum.

Science and Research - a new experimental platform in Brazil

ATMOSPHERE .

The document discusses Brazil's cyberinfrastructure and plans for its development. It outlines the current situation including remote collaboration services, remote visualization, distributed software platforms and more. It emphasizes the need to better integrate these resources. The national cyberinfrastructure program for 2020-2022 then details plans to improve the national communication infrastructure, develop academic cloud services, and establish a national open data initiative to organize and support large collaboration projects through services, repositories, and high performance computing resources. The goal is to simplify and promote the use of technologies through a cloud marketplace and integrated services to support research.

Владимир Слободянюк «DWH & BigData – architecture approaches»

Anna Shymchenko

This document discusses approaches to data warehouse (DWH) and big data architectures. It begins with an overview of big data, describing its large size and complexity that makes it difficult to process with traditional databases. It then compares Hadoop and relational database management systems (RDBMS), noting pros and cons of each for distributed computing. The document outlines how Hadoop uses MapReduce and has a structure including HDFS, HBase, Hive and Pig. Finally, it proposes using Hadoop as an ETL and data quality tool to improve traceability, reduce costs and handle exception data cleansing more effectively.

Big Data

Raja Ram Dutta

Big data refers to large and complex datasets that are difficult to process using traditional data management tools. As data grows from gigabytes to terabytes to petabytes, new techniques are needed to store, process, and analyze this data. Hadoop is an open-source framework that uses distributed storage and processing to handle big data across clusters of computers. It includes HDFS for storage and MapReduce as a programming model for distributed processing of large datasets in parallel.

DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...

Mihai Criveti

- The document discusses automating data science pipelines with DevOps tools like Ansible, Packer, and Kubernetes. - It covers obtaining data, exploring and modeling data, and how to automate infrastructure setup and deployment with tools like Packer to build machine images and Ansible for configuration management. - The rise of DevOps and its cultural aspects are discussed as well as how tools like Packer, Ansible, Kubernetes can help automate infrastructure and deploy machine learning models at scale in production environments.

Real-time big data analytics based on product recommendations case study

deep.bi

We started as an ad network. The challenge was to recommend the best product (out of millions) to the right person in a given moment (thousands of users within a second). We have delivered 5 billion ad views since 24 months. To put it in the scale context: If we would serve 1 ad per second it will take 160 years to serve 5 billion ads. So we needed a solution. SQL databases did not work. Popular NoSQL databases did not work. Standard data warehouse approaches (pre-aggregations, creating schemas) - did not work too. Re-thinking all the problems with huge data streams flowing to us every second we have built a complete solution based on open-source technologies and fresh, smart ideas from our engineering team. It is called deep.bi and now we make it available to other companies. deep.bi lets high-growth companies solve fast data problems by providing scalable, flexible and real-time data collection, enrichment and analytics. It was built using: - Node.js - API - Kafka - collecting and distributing data - Spark Streaming - ETL, data enrichments - Druid - real-time analytics - Cassandra - user events store - Hadoop + Parquet + Spark - raw data store + ad-hoc queries

What's hot

Future of Data - Big Data

Shankar R

Big Data Visualisation with Hadoop and PowerPivot

Jen Stirrup

Graph Database and Neo4j

Sina Khorami

Big data landscape

Natalino Busa

Présentation on radoop

siliconsudipt

Hadoop Training Tutorial for Freshers

rajkamaltibacademy

A Brief History Of Data

Damien Dallimore

Turnkey Multi-Region, Active-Active Session Stores with Steeltoe, Redis Enter...

VMware Tanzu

View on big data technologies

Krisshhna Daasaarii

Introducing the Big Data Ecosystem with Caserta Concepts & Talend

Caserta

Overview of Bigdata Analytics

Sankarapu Anjaneyulu

Seattle scalability meetup March 27,2013 intro slides

clive boulton

Big data PPT

Nitesh Dubey

Hadoop - An Introduction

Shankar R

Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...

yashbheda

Bigdata

Shankar R

Big Data

Neha Mehta

Science and Research - a new experimental platform in Brazil

ATMOSPHERE .

Владимир Слободянюк «DWH & BigData – architecture approaches»

Anna Shymchenko

Big Data

Raja Ram Dutta

What's hot (20)

Future of Data - Big Data

Big Data Visualisation with Hadoop and PowerPivot

Graph Database and Neo4j

Big data landscape

Présentation on radoop

Hadoop Training Tutorial for Freshers

A Brief History Of Data

Turnkey Multi-Region, Active-Active Session Stores with Steeltoe, Redis Enter...

View on big data technologies

Introducing the Big Data Ecosystem with Caserta Concepts & Talend

Overview of Bigdata Analytics

Seattle scalability meetup March 27,2013 intro slides

Big data PPT

Hadoop - An Introduction

Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...

Bigdata

Big Data

Science and Research - a new experimental platform in Brazil

Владимир Слободянюк «DWH & BigData – architecture approaches»

Big Data

Similar to Open source for customer analytics

DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...

Mihai Criveti

Real-time big data analytics based on product recommendations case study

deep.bi

Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...

Yael Garten

2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047 Description: So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it. As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop. Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...

Shirshanka Das

So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it. As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop. Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.

Azure HDInsight

Koray Kocabas

This document provides an overview of big data and how Azure HDInsight can be used to work with big data. It discusses the evolution of data from gigabytes to exabytes and the big data utility gap where most data is stored but not analyzed. It then discusses how to store everything, analyze anything, and build the right thing using big data. Examples are provided of companies generating large amounts of data. An overview of the Hadoop ecosystem is given along with examples of using Hive and Pig on HDInsight to query and analyze large datasets. A case study of Klout is also summarized.

Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02

BIWUG

This document provides an overview of how to build your own personalized search and discovery tool like Microsoft Delve by combining machine learning, big data, and SharePoint. It discusses the Office Graph and how signals across Office 365 are used to populate insights. It also covers big data concepts like Hadoop and machine learning algorithms. Finally, it proposes a high-level architectural concept for building a Delve-like tool using Azure SQL Database, Azure Storage, Azure Machine Learning, and presenting insights.

How to build your own Delve: combining machine learning, big data and SharePoint

Joris Poelmans

You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.

Continuum Analytics and Python

Travis Oliphant

BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...

Alex Liu

Big Data Session 1.pptx

ElsonPaul2

This document provides an introduction to a course on big data and analytics. It outlines the instructor and teaching assistant contact information. It then lists the main topics to be covered, including data analytics and mining techniques, Hadoop/MapReduce programming, graph databases and analytics. It defines big data and discusses the 3Vs of big data - volume, variety and velocity. It also covers big data technologies like cloud computing, Hadoop, and graph databases. Course requirements and the grading scheme are outlined.

On Big Data

arttan2001

Big Data Driven Solutions to Combat Covid' 19

Prof.Balakrishnan S

This document summarizes a talk on using big data driven solutions to combat COVID-19. It discusses how big data preparation involves ingesting, cleansing, and enriching data from various sources. It also describes common big data technologies used for storage, mining, analytics and visualization including Hadoop, Presto, Kafka and Tableau. Finally, it provides examples of research projects applying big data and AI to track COVID-19 cases, model disease spread, and optimize health resource utilization.

How to build and run a big data platform in the 21st century

Ali Dasdan

The document provides an overview of big data platform architectures that have been built by various companies and organizations. It discusses self-built platforms from companies like Airbnb, Netflix, Facebook, Slack, and Uber. It also covers cloud-built platforms on IBM Cloud, Microsoft Azure, Google Cloud, and Amazon AWS. Consulting-built platforms from Cloudera and ThoughtWorks are presented. Finally, it introduces the NIST Big Data Reference Architecture as a standard reference model and discusses generic batch vs streaming architectures like Lambda and Kappa.

IIPGH Webinar 1: Getting Started With Data Science

ds4good

Data Science at Scale - The DevOps Approach

Mihai Criveti

Big data analytics 1

gauravsc36

Webinar: How Banks Use MongoDB as a Tick Database

MongoDB

State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...

Big Data Spain

http://www.bigdataspain.org/2014/conference/state-of-play-data-science-on-hadoop-in-2015-keynote Machine Learning is not new. Big Machine Learning is qualitatively different: More data beats algorithm improvement, scale trumps noise and sample size effects, can brute-force manual tasks. Session presented at Big Data Spain 2014 Conference 18th Nov 2014 Kinépolis Madrid http://www.bigdataspain.org Event promoted by: http://www.paradigmatecnologico.com Slides: https://speakerdeck.com/bigdataspain/state-of-play-data-science-on-hadoop-in-2015-by-sean-owen-at-big-data-spain-2014

5. big data vs it stki - pini cohen

Taldor Group

The document discusses big data and how it differs from traditional IT approaches. It defines big data using the four V's - volume, velocity, variety, and variability. Technologies used for big data like Hadoop, MapReduce, and NoSQL databases are outlined. Differences between big data infrastructure and traditional IT infrastructure and BI are explored. Examples of how Orbitz and the DoD use big data are provided. The business value of big data analytics is discussed as enabling new types of analysis and insights not previously possible.

How Can Analytics Improve Business?

Inside Analysis

TechWise with Eric Kavanagh, Dr. Robin Bloor and Dr. Kirk Borne Live Webcast on July 23, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=59d50a520542ee7ed00a0c38e8319b54 Analytical applications are everywhere these days, and for good reason. Organizations large and small are using analytics to better understand any aspect of their business: customers, processes, behaviors, even competitors. There are several critical success factors for using analytics effectively: 1) know which kind of apps make sense for your company; 2) figure out which data sets you can use, both internal and external; 3) determine optimal roles and responsibilities for your team; 4) identify where you need help, either by hiring new employees or using consultants 5) manage your program effectively over time. Register for this episode of TechWise to learn from two of the most experienced analysts in the business: Dr. Robin Bloor, Chief Analyst of The Bloor Group, and Dr. Kirk Borne, Data Scientist, George Mason University. Each will provide their perspective on how companies can address each of the key success factors in building, refining and using analytics to improve their business. There will then be an extensive Q&A session in which attendees can ask detailed questions of our experts and get answers in real time. Registrants will also receive a consolidated deck of slides, not just from the main presenters, but also from a variety of software vendors who provide targeted solutions. Visit InsideAnlaysis.com for more information.

Similar to Open source for customer analytics (20)

DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...

Real-time big data analytics based on product recommendations case study

Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...

Azure HDInsight

Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02

How to build your own Delve: combining machine learning, big data and SharePoint

Continuum Analytics and Python

BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...

Big Data Session 1.pptx

On Big Data

Big Data Driven Solutions to Combat Covid' 19

How to build and run a big data platform in the 21st century

IIPGH Webinar 1: Getting Started With Data Science

Data Science at Scale - The DevOps Approach

Big data analytics 1

Webinar: How Banks Use MongoDB as a Tick Database

State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...

5. big data vs it stki - pini cohen

How Can Analytics Improve Business?

Recently uploaded

ACE - Team 24 Wrapup event at ahmedabad.

Maitrey Patel

Kubernetes at Scale: Going Multi-Cluster with Istio

Severalnines

Unveiling the Advantages of Agile Software Development.pdf

brainerhub1

WWDC 2024 Keynote Review: For CocoaCoders Austin

Patrick Weigel

GreenCode-A-VSCode-Plugin--Dario-Jurisic

Green Software Development

The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...

kalichargn70th171

UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions

Peter Muessig

The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.

Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)

safelyiotech

Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices. These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions. Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site. The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages. Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information. This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理

dakas1

UMN硕士毕业证成绩单【微信95270640】购买（明尼苏达大学毕业证成绩单硕士学历）Q微信95270640代办UMN学历认证留信网伪造明尼苏达大学学位证书精仿明尼苏达大学本科/硕士文凭证书补办明尼苏达大学 diplomaoffer,Transcript购买明尼苏达大学毕业证成绩单购买UMN假毕业证学位证书购买伪造明尼苏达大学文凭证书学位证书,专业办理雅思、托福成绩单，学生ID卡，在读证明，海外各大学offer录取通知书，毕业证书，成绩单，文凭等材料:1:1完美还原毕业证、offer录取通知书、学生卡等各种在读或毕业材料的防伪工艺（包括烫金、烫银、钢印、底纹、凹凸版、水印、防伪光标、热敏防伪、文字图案浮雕，激光镭射，紫外荧光，温感光标）学校原版上有的工艺我们一样不会少，不论是老版本还是最新版本，都能保证最高程度还原，力争完美以求让所有同学都能享受到完美的品质服务。 #毕业证成绩单 #毕业証 #成绩单 #學生卡 #OFFER录取通知书 #雅思#托福等…… 国外大学明尼苏达大学明尼苏达大学毕业证offer制作方法（一对一专业服务） 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄） — — 制作工艺【高仿真】— — 凭借多年的制作经验本公司制作明尼苏达大学明尼苏达大学毕业证offer《激光》《水印》《钢印》《烫金》《紫外线》凹凸版uv版等防伪技术一流高精仿度几乎跟学校100%相同！让您绝对满意。 — — -公司理念【诚信为主】— — — 我們以質量求生存.以服务求发展有雄厚的实力专业的团队咨询顾问为您细心解答可详谈是真是假眼见为实让您真正放心平凡人生,尽我所能助您一臂之力让我們携手圆您梦想! 此贴长年有效【贴心专线/微-信: 95270640】敬请保留此联系方式以备用！如有不在线请给我们留言！我们将在第一时间给您回复!上散发着一抹抹的光晕而这每处自然形成的细节融合在一起浑然天成的美实在令人心生愉悦小道的周边无秩序的生长着几株艳丽的野花红的粉的紫的虽混乱无章却给这幅美景更增添一份性感夹杂着一份纯洁的妖娆毫无违和感实在给人带来一份悠然幸福的心情如果说现在的审美已经断然拒绝了无声的话那么在树林间飞掠而过的小鸟叽叽咋咋的叫声是否就是这最后的点睛之笔悠然走在林间的小路上宁静与清香一丝丝的盛夏气息吸入身体昔日生活里的繁忙多

Microservice Teams - How the cloud changes the way we work

Sven Peters

A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams? Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.

UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem

Peter Muessig

Enums On Steroids - let's look at sealed classes !

Marcin Chrost

8 Best Automated Android App Testing Tool and Framework in 2024.pdf

kalichargn70th171

J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...

Bert Jan Schrijver

INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLES

anfaltahir1010

Image: Include an image that represents the concept of precision, such as a AI helix or a futuristic healthcare setting. Objective: Provide a foundational understanding of precision medicine and its departure from traditional approaches Role of theory: Discuss how genomics, the study of an organism's complete set of AI , plays a crucial role in precision medicine. Customizing treatment plans: Highlight how genetic information is used to customize treatment plans based on an individual's genetic makeup. Examples: Provide real-world examples of successful application of AI such as genetic therapies or targeted treatments. Importance of molecular diagnostics: Explain the role of molecular diagnostics in identifying molecular and genetic markers associated with diseases. Biomarker testing: Showcase how biomarker testing aids in creating personalized treatment plans. Content: • Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and potential misuse of genetic information. • Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard patient rights. • Visuals: Include images or icons representing ethical considerations. Content: • Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and potential misuse of genetic information. • Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard patient rights. • Visuals: Include images or icons representing ethical considerations. Content: • Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and potential misuse of genetic information. • Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard patient rights. • Visuals: Include images or icons representing ethical considerations. Real-world case study: Present a detailed case study showcasing the success of precision medicine in a specific medical scenario. Patient's journey: Discuss the patient's journey, treatment plan, and outcomes. Impact: Emphasize the transformative effect of precision medicine on the individual's health. Objective: Ground the presentation in a real-world example, highlighting the practical application and success of precision medicine. Data challenges: Address the challenges associated with managing large sets of patient data in precision medicine. Technological solutions: Discuss technological innovations and solutions for handling and analyzing vast datasets. Visuals: Include graphics representing data management challenges and technological solutions. Objective: Acknowledge the data-related challenges in precision medicine and highlight innovative solutions. Data challenges: Address the challenges associated with managing large sets of patient data in precision medicine. Technological solutions: Discuss technological innovations and solutions

14 th Edition of International conference on computer vision

ShulagnaSarkar2

About the event 14th Edition of International conference on computer vision Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products. Nomination are Open!! Don't Miss it Visit: computer.scifat.com Award Nomination: https://x-i.me/ishnom Conference Submission: https://x-i.me/anicon For Enquiry: Computer@scifat.com

Modelling Up - DDDEurope 2024 - Amsterdam

Alberto Brandolini

ALGIT - Assembly Line for Green IT - Numbers, Data, Facts

Green Software Development

一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理

kgyxske

原版一模一样【微信：741003700 】【(sdsu毕业证书)圣地亚哥州立大学毕业证成绩单】【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才办理(sdsu毕业证书)圣地亚哥州立大学毕业证【微信：741003700 】外观非常简单，由纸质材料制成，上面印有校徽、校名、毕业生姓名、专业等信息。办理(sdsu毕业证书)圣地亚哥州立大学毕业证【微信：741003700 】格式相对统一，各专业都有相应的模板。通常包括以下部分：校徽：象征着学校的荣誉和传承。校名:学校英文全称授予学位：本部分将注明获得的具体学位名称。毕业生姓名：这是最重要的信息之一，标志着该证书是由特定人员获得的。颁发日期：这是毕业正式生效的时间，也代表着毕业生学业的结束。其他信息：根据不同的专业和学位，可能会有一些特定的信息或章节。办理(sdsu毕业证书)圣地亚哥州立大学毕业证【微信：741003700 】价值很高，需要妥善保管。一般来说，应放置在安全、干燥、防潮的地方，避免长时间暴露在阳光下。如需使用，最好使用复印件而不是原件，以免丢失。综上所述，办理(sdsu毕业证书)圣地亚哥州立大学毕业证【微信：741003700 】是证明身份和学历的高价值文件。外观简单庄重，格式统一，包括重要的个人信息和发布日期。对持有人来说，妥善保管是非常重要的。

What’s New in Odoo 17 – A Complete Roadmap

Envertis Software Solutions

Odoo releases a new update every year. The latest version, Odoo 17, came out in October 2023. It brought many improvements to the user interface and user experience, along with new features in modules like accounting, marketing, manufacturing, websites, and more. The Odoo 17 update has been a hot topic among startups, mid-sized businesses, large enterprises, and Odoo developers aiming to grow their businesses. Since it is now already the first quarter of 2024, you must have a clear idea of what Odoo 17 entails and what it can offer your business if you are still not aware of it. This blog covers the features and functionalities. Explore the entire blog and get in touch with expert Odoo ERP consultants to leverage Odoo 17 and its features for your business too. An Overview of Odoo ERP Odoo ERP was first released as OpenERP software in February 2005. It is a suite of business applications used for ERP, CRM, eCommerce, websites, and project management. Ten years ago, the Odoo Enterprise edition was launched to help fund the Odoo Community version. When you compare Odoo Community and Enterprise, the Enterprise edition offers exclusive features like mobile app access, Odoo Studio customisation, Odoo hosting, and unlimited functional support. Today, Odoo is a well-known name used by companies of all sizes across various industries, including manufacturing, retail, accounting, marketing, healthcare, IT consulting, and R&D. The latest version, Odoo 17, has been available since October 2023. Key highlights of this update include: Enhanced user experience with improvements to the command bar, faster backend page loading, and multiple dashboard views. Instant report generation, credit limit alerts for sales and invoices, separate OCR settings for invoice creation, and an auto-complete feature for forms in the accounting module. Improved image handling and global attribute changes for mailing lists in email marketing. A default auto-signature option and a refuse-to-sign option in HR modules. Options to divide and merge manufacturing orders, track the status of manufacturing orders, and more in the MRP module. Dark mode in Odoo 17. Now that the Odoo 17 announcement is official, let’s look at what’s new in Odoo 17! What is Odoo ERP 17? Odoo 17 is the latest version of one of the world’s leading open-source enterprise ERPs. This version has come up with significant improvements explained here in this blog. Also, this new version aims to introduce features that enhance time-saving, efficiency, and productivity for users across various organisations. Odoo 17, released at the Odoo Experience 2023, brought notable improvements to the user interface and added new functionalities with enhancements in performance, accessibility, data analysis, and management, further expanding its reach in the market.

Recently uploaded (20)

ACE - Team 24 Wrapup event at ahmedabad.

Kubernetes at Scale: Going Multi-Cluster with Istio

Unveiling the Advantages of Agile Software Development.pdf

WWDC 2024 Keynote Review: For CocoaCoders Austin

GreenCode-A-VSCode-Plugin--Dario-Jurisic

The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...

UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions

Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理

Microservice Teams - How the cloud changes the way we work

UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem

Enums On Steroids - let's look at sealed classes !

8 Best Automated Android App Testing Tool and Framework in 2024.pdf

J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...

INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLES

14 th Edition of International conference on computer vision

Modelling Up - DDDEurope 2024 - Amsterdam

ALGIT - Assembly Line for Green IT - Numbers, Data, Facts

一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理

What’s New in Odoo 17 – A Complete Roadmap

Open source for customer analytics

1. Open Source for Customer Analytics Matthias Funke Business & Technology Consultant

2. Agenda Topics Open Source Software Data Products The “Data Process” Tying it together

3. Open Source Software Examples: Linux, LibreOffice, Eclipse, Hadoop Source Code open, e.g. github.com (>3M users, 6.8M repos) Governed by foundations, e.g. Apache Software Foundation, Free Software Foundation Contributors / committers: Academia, start-ups, corporations, specialised OSS companies

4. Popular Apache Software Projects Project Donated by... Cassandra Facebook (2008) Storm Twitter (2013) Hadoop Yahoo (2008) Kafka LinkedIn

5. Apache Software Foundation Sponsors Google, Yahoo, Microsoft, Facebook, Citrix… HP, IBM, Hortonworks, Cloudera, Comcast Auto & General, Huawei, Pivotal, … Talend, Twitter

6. Benefits, Drawbacks & Facts Benefits ● No Licence Cost ● Huge amount of knowledge in the community ● High speed of innovation ● Funny names Drawbacks ● Overwhelming choices ● Varying maturity ● Skills challenge (for newer projects) Facts of Life ● Professional Services / Support not free

7. “Data Products” Core: valuable data. Tools to display and manipulate. Good: live, visual, searchable Types: ● Exploratory ● Internal production ● Publicly facing (but free) ● Commercial = monetised VOLUME VARIETY VELOCITY VERACITY

8. Popular Data Products Google Flights (not a booking engine!) CIA World Fact Book (simple presentation) Inside AirBnB (“activist”) data.gov.uk

10. The Data Process 1. Obtain data 2. Explore & clean data 3. Analyse & model 4. Visualise 5. Productionise & automate Data Pipeline a. How and where to distribute? b. How to scale? c. How to secure? d. How to manage day-to-day?

11. Data Exploration on One PC

12. Using ggplot2 for exploratory graphs qplot(host$availability_365, + geom="histogram", + binwidth = 5, + main = "Histogram for Availability", + xlab = "AirBnB in London", + fill=I("blue"))

13. Statistical Analysis SIMPLE ● Sum, Count, Mean / Median ● Variance / Standard Deviation E.g. Average Revenue per User per Neighbourhood (by Month of the Year) MORE COMPLEX ● Clustering ● Co-variance matrix (dependencies between variables) ● Predictive Models ● Machine Learning

14. Big Data Architectures (simplified) “Big” Database Hadoop Cluster / File System Query Engine (Data Access) Execution Engine (Business Logic) Search Engine (Accessibility) Visualisation Layer

15. Visualisation using KIBANA

16. Trusted Analytics Platform - Brand New OSS

17. Interactive Notebooks New breed of software to work interactively on data Spark/Scala Notebook Apache Zeppelin Databricks: cloud (proprietary but built on Spark)

Open source for customer analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Open source for customer analytics

Similar to Open source for customer analytics (20)

Recently uploaded

Recently uploaded (20)

Open source for customer analytics