Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace – from digital transformation, to marketing, to customer centricity, to population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
Digital Transformation is a top priority for many organizations, and a successful digital journey requires a strong data foundation. Creating this digital transformation requires a number of core data management capabilities such as MDM, With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Data Governance — Aligning Technical and Business ApproachesDATAVERSITY
Data Governance can have a varied definition, depending on the audience. To many, data governance consists of committee meetings and stewardship roles. To others, it focuses on technical data management and controls. Holistic data governance combines both of these aspects, and a robust data architecture and associated diagrams can be the “glue” that binds business and IT governance together. Join this webinar for practical tips and hands-on exercises for aligning data architecture & data governance for business and IT success.
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
Business intelligence (BI) and data analytics are increasing in popularity as more organizations are looking to become more data-driven. Many tools have powerful visualization techniques that can create dynamic displays of critical information. To ensure that the data displayed on these visualizations is accurate and timely, a strong Data Architecture is needed. Join this webinar to understand how to create a robust Data Architecture for BI and data analytics that takes both business and technology needs into consideration.
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace – from digital transformation, to marketing, to customer centricity, to population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
Digital Transformation is a top priority for many organizations, and a successful digital journey requires a strong data foundation. Creating this digital transformation requires a number of core data management capabilities such as MDM, With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Data Governance — Aligning Technical and Business ApproachesDATAVERSITY
Data Governance can have a varied definition, depending on the audience. To many, data governance consists of committee meetings and stewardship roles. To others, it focuses on technical data management and controls. Holistic data governance combines both of these aspects, and a robust data architecture and associated diagrams can be the “glue” that binds business and IT governance together. Join this webinar for practical tips and hands-on exercises for aligning data architecture & data governance for business and IT success.
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
Business intelligence (BI) and data analytics are increasing in popularity as more organizations are looking to become more data-driven. Many tools have powerful visualization techniques that can create dynamic displays of critical information. To ensure that the data displayed on these visualizations is accurate and timely, a strong Data Architecture is needed. Join this webinar to understand how to create a robust Data Architecture for BI and data analytics that takes both business and technology needs into consideration.
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
Big Data Analytics Powerpoint Presentation SlideSlideTeam
If it’s that time to make analysis for the predicament of the management system or simply to present deafening data in front of your qualified team then you have reached the right match. SlideTeam presents you classy and eternally approaching PowerPoint slides for big data analytics. Data analysis agendas and big data plans are shown through captivating icons and subheadings for a precise and interesting approach. This unique PPT slide is useful for studying business and marketing related topics, approaching the correct conclusions and keeping a track on business growth. Make an outstanding presentation for your viewers with this unique PPT slide and deliver your message in an effective manner using Big data analytics Powerpoint Presentation slide and make your pathways more defining. Most of the elements of the slide are highly customizable. The text boxes help you in adding more information about the point mentioned and its associated icon. Every detail in our Big Data Analytics Powerpoint Presentation Slide is doubly cross checked. You can be certain of it's authenticity. https://bit.ly/3fvnRVK
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
Data Lake or Data Swamp? By now, we’ve likely all heard the comparison. Data Lake architectures have the opportunity to provide the ability to integrate vast amounts of disparate data across the organization for strategic business analytic value. But without a proper architecture and metadata management strategy in place, a Data Lake can quickly devolve into a swamp of information that is difficult to understand. This webinar will offer practical strategies to architect and manage your Data Lake in a way that optimizes its success.
Gartner: Master Data Management FunctionalityGartner
Gartner will further examine key trends shaping the future MDM market during the Gartner MDM Summit 2011, 2-3 February in London. More information at www.europe.gartner.com/mdm
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
A 3-day interactive workshop for startups involve in Big Data & Analytics in Asia. Introduction to Big Data & Analytics concepts, and case studies in R Programming, Excel, Web APIs, and many more.
DOI: 10.13140/RG.2.2.10638.36162
You had a strategy. You were executing it. You were then side-swiped by COVID, spending countless cycles blocking and tackling. It is now time to step back onto your path.
CCG is holding a workshop to help you update your roadmap and get your team back on track and review how Microsoft Azure Solutions can be leveraged to build a strong foundation for governed data insights.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Data Literacy is an increasing concern, as organizations look to become more data-driven. As the rise of the citizen data scientist and self-service data analytics becomes increasingly common, the need for business users to understand core Data Management fundamentals is more important than ever. At the same time, technical roles need a strong foundation in Data Architecture principles and best practices. Join this webinar to understand the key components of Data Literacy, and practical ways to implement a Data Literacy program in your organization.
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDATAVERSITY
Roles and responsibilities are a critical component of every Data Governance program. Building a set of roles that are practical and that will not interfere with people’s “day jobs” is an important consideration that will influence how well your program is adopted. This tutorial focuses on sharing a proven model guaranteed to represent your organization.
Join Bob Seiner for this lively webinar where he will dissect a complete Operating Model of Roles and Responsibilities that encompasses all levels of the organization. Seiner will detail the roles and describe the most effective way to associate people with the roles. You will walk out of this webinar with a model to apply to your organization.
In this session Bob will share:
- The five levels of Data Governance roles
- A proven Operating Model of Roles and Responsibilities
- How to customize the model to meet your requirements
- Setting appropriate role expectations
- How to operationalize the roles and demonstrate value
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task. The opportunity in getting it right can be significant, however, as data drives many of the key initiatives in today’s marketplace from digital transformation, to marketing, to customer centricity, population health, and more. This webinar will help de-mystify data strategy and data architecture and will provide concrete, practical ways to get started.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace, from digital transformation to marketing, customer centricity, population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
Creating a clearly articulated data strategy—a roadmap of technology-driven capability investments prioritized to deliver value—helps ensure from the get-go that you are focusing on the right things, so that your work with data has a business impact. In this presentation, the experts at Silicon Valley Data Science share their approach for crafting an actionable and flexible data strategy to maximize business value.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
Big Data Analytics Powerpoint Presentation SlideSlideTeam
If it’s that time to make analysis for the predicament of the management system or simply to present deafening data in front of your qualified team then you have reached the right match. SlideTeam presents you classy and eternally approaching PowerPoint slides for big data analytics. Data analysis agendas and big data plans are shown through captivating icons and subheadings for a precise and interesting approach. This unique PPT slide is useful for studying business and marketing related topics, approaching the correct conclusions and keeping a track on business growth. Make an outstanding presentation for your viewers with this unique PPT slide and deliver your message in an effective manner using Big data analytics Powerpoint Presentation slide and make your pathways more defining. Most of the elements of the slide are highly customizable. The text boxes help you in adding more information about the point mentioned and its associated icon. Every detail in our Big Data Analytics Powerpoint Presentation Slide is doubly cross checked. You can be certain of it's authenticity. https://bit.ly/3fvnRVK
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
Data Lake or Data Swamp? By now, we’ve likely all heard the comparison. Data Lake architectures have the opportunity to provide the ability to integrate vast amounts of disparate data across the organization for strategic business analytic value. But without a proper architecture and metadata management strategy in place, a Data Lake can quickly devolve into a swamp of information that is difficult to understand. This webinar will offer practical strategies to architect and manage your Data Lake in a way that optimizes its success.
Gartner: Master Data Management FunctionalityGartner
Gartner will further examine key trends shaping the future MDM market during the Gartner MDM Summit 2011, 2-3 February in London. More information at www.europe.gartner.com/mdm
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
A 3-day interactive workshop for startups involve in Big Data & Analytics in Asia. Introduction to Big Data & Analytics concepts, and case studies in R Programming, Excel, Web APIs, and many more.
DOI: 10.13140/RG.2.2.10638.36162
You had a strategy. You were executing it. You were then side-swiped by COVID, spending countless cycles blocking and tackling. It is now time to step back onto your path.
CCG is holding a workshop to help you update your roadmap and get your team back on track and review how Microsoft Azure Solutions can be leveraged to build a strong foundation for governed data insights.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Data Literacy is an increasing concern, as organizations look to become more data-driven. As the rise of the citizen data scientist and self-service data analytics becomes increasingly common, the need for business users to understand core Data Management fundamentals is more important than ever. At the same time, technical roles need a strong foundation in Data Architecture principles and best practices. Join this webinar to understand the key components of Data Literacy, and practical ways to implement a Data Literacy program in your organization.
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDATAVERSITY
Roles and responsibilities are a critical component of every Data Governance program. Building a set of roles that are practical and that will not interfere with people’s “day jobs” is an important consideration that will influence how well your program is adopted. This tutorial focuses on sharing a proven model guaranteed to represent your organization.
Join Bob Seiner for this lively webinar where he will dissect a complete Operating Model of Roles and Responsibilities that encompasses all levels of the organization. Seiner will detail the roles and describe the most effective way to associate people with the roles. You will walk out of this webinar with a model to apply to your organization.
In this session Bob will share:
- The five levels of Data Governance roles
- A proven Operating Model of Roles and Responsibilities
- How to customize the model to meet your requirements
- Setting appropriate role expectations
- How to operationalize the roles and demonstrate value
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task. The opportunity in getting it right can be significant, however, as data drives many of the key initiatives in today’s marketplace from digital transformation, to marketing, to customer centricity, population health, and more. This webinar will help de-mystify data strategy and data architecture and will provide concrete, practical ways to get started.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace, from digital transformation to marketing, customer centricity, population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
Creating a clearly articulated data strategy—a roadmap of technology-driven capability investments prioritized to deliver value—helps ensure from the get-go that you are focusing on the right things, so that your work with data has a business impact. In this presentation, the experts at Silicon Valley Data Science share their approach for crafting an actionable and flexible data strategy to maximize business value.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
Rip and replace isn't a good approach to IT change. When looking at Hadoop, MPP, in-memory and predictive analytics the challenge is making them co-exist with current solutions.
Learn how Capgemini’s Pivotal CoE utilizes Cloud Foundry and PivotalOne to help businesses adopt new technologies without losing the value of current investments.
Presented by Michael Wood of Pivotal and Steve Jones, Global Director, Strategy, Big Data and Analytics, Capgemini, at EMC World 2014.
Big Data beyond Apache Hadoop - How to integrate ALL your DataKai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data.
Apache Hadoop is the open source defacto standard for implementing big data solutions on the Java platform. Hadoop consists of its kernel, MapReduce, and the Hadoop Distributed Filesystem (HDFS). A challenging task is to send all data to Hadoop for processing and storage (and then get it back to your application later), because in practice data comes from many different applications (SAP, Salesforce, Siebel, etc.) and databases (File, SQL, NoSQL), uses different technologies and concepts for communication (e.g. HTTP, FTP, RMI, JMS), and consists of different data formats using CSV, XML, binary data, or other alternatives.
This session shows the powerful combination of Apache Hadoop and Apache Camel to solve this challenging task. Learn how to use every thinkable data with Hadoop – without plenty of complex or redundant boilerplate code. Besides supporting the integration of all different technologies and data formats, Apache Camel also offers an easy, standardized DSL to transform, split or filter incoming data using the Enterprise Integration Patterns (EIP). Therefore, Apache Hadoop and Apache Camel are a perfect match for processing big data on the Java platform.
Exploring How to Use Hadoop in your Healthcare Big Data StrategyHealth Catalyst
Big Data, Big Data, Big Data – everybody is talking about it, but what is it, why are people talking about it, and how is it being done? Come ready to talk about emerging healthcare big data use cases that are pleading for the help of practical and powerful technologies like Spark, Hive, and others. If applied appropriately, these technologies can rev up your data warehouse and help you to address evolving data-driven healthcare needs around unstructured data, real-time data feeds, and machine learning.
Sean Stohl, SVP in Product Development at Health Catalyst, will give you a practical understanding of where to get started with these technologies. Sean will also give you a glimpse how he thinks these technologies will evolve over time in this technically focused webinar.
Attendees will be able to explain:
What Big Data and Hadoop are
Why Big Data and Hadoop are needed in healthcare
What the challenges to adoption are
How to get started
Attendees will also get to see Big Data in action. We look forward to you joining us.
Online Diabetes: Inferring Community Structure in Healthcare Forums. Luis Fernandez Luque
Inferring community structure in healthcare forums. An empirical study by Chomutare T, Arsand E, Fernandez-Luque L, Lauritzen J, Hartvigsen G. Methods Inf Med. 2013;52(2):160-7. https://www.ncbi.nlm.nih.gov/pubmed/23392282
Abstract
BACKGROUND:
Detecting community structures in complex networks is a problem interesting to several domains. In healthcare, discovering communities may enhance the quality of web offerings for people with chronic diseases. Understanding the social dynamics and community attachments is key to predicting and influencing interaction and information flow to the right patients.
OBJECTIVES:
The goal of the study is to empirically assess the extent to which we can infer meaningful community structures from implicit networks of peer interaction in online healthcare forums.
METHODS:
We used datasets from five online diabetes forums to design networks based on peer-interactions. A quality function based on user interaction similarity was used to assess the quality of the discovered communities to complement existing homophily measures.
RESULTS:
Results show that we can infer meaningful communities by observing forum interactions. Closely similar users tended to co-appear in the top communities, suggesting the discovered communities are intuitive. The number of years since diagnosis was a significant factor for cohesiveness in some diabetes communities.
CONCLUSION:
Network analysis is a tool that can be useful in studying implicit networks that form in healthcare forums. Current analysis informs further work on predicting and influencing interaction, information flow and user interests that could be useful for personalizing medical social media.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Logical Data Warehouse and Data Lakes can play a role in many different type of projects and, in this presentation, we will look at some of the most common patterns and use cases. Learn about analytical and big data patterns as well as performance considerations. Example implementations will be discussed for each pattern.
- Architectural patterns for logical data warehouse and data lakes.
- Performance considerations.
- Customer use cases and demo.
This presentation is part of the Denodo Educational Seminar, and you can watch the video here goo.gl/vycYmZ.
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
The Hadoop ecosystem has improved real-time access capabilities recently, narrowing the gap with relational database technologies. However, gaps remain in the storage layer that complicate the transition to Hadoop-based architectures. In this session, the presenter will describe these gaps and discuss the tradeoffs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. The session also will cover Kudu (currently in beta), the new addition to the open source Hadoop ecosystem with outof-the-box integration with Apache Spark and Apache Impala (incubating), that achieves fast scans and fast random access from a single API.
Enablers, Platforms, & Early adopters for internet of things. How hadoop helps in enabling the technology to process data from sensors? What are the limitations in using Hadoop for internet of things?
Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.
Extracting value from Big Data is not easy. The field of technologies and vendors is fragmented and rapidly evolving. End-to-end, general purpose solutions that work out of the box don’t exist yet, and Hadoop is no exception. And most companies lack Big Data specialists. The key to unlocking real value lies with thinking smart and hard about the business requirements for a Big Data solution. There is a long list of crucial questions to think about. Is Hadoop really the best solution for all Big Data needs? Should companies run a Hadoop cluster on expensive enterprise-grade storage, or use cheap commodity servers? Should the chosen infrastructure be bare metal or virtualized? The picture becomes even more confusing at the analysis and visualization layer. The answer to Big Data ROI lies somewhere between the herd and nerd mentality. Thinking hard and being smart about each use case as early as possible avoids costly mistakes in choosing hardware and software. This talk will illustrate how Deutsche Telekom follows this segmentation approach to make sure every individual use case drives architecture design and the selection of technologies and vendors.
In this video from the HPC User Forum in Santa Fe, Yoonho Park from IBM presents: IBM Datacentric Servers & OpenPOWER.
"Big data analytics, machine learning and deep learning are among the most rapidly growing workloads in the data center. These workloads have the compute performance requirements of traditional technical computing or high performance computing, coupled with a much larger volume and velocity of data."
Watch the video: http://wp.me/p3RLHQ-gJv
Learn more: https://openpowerfoundation.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
Watch here: https://bit.ly/2NGQD7R
In an era increasingly dominated by advancements in cloud computing, AI and advanced analytics it may come as a shock that many organizations still rely on data architectures built before the turn of the century. But that scenario is rapidly changing with the increasing adoption of real-time data virtualization - a paradigm shift in the approach that organizations take towards accessing, integrating, and provisioning data required to meet business goals.
As data analytics and data-driven intelligence takes centre stage in today’s digital economy, logical data integration across the widest variety of data sources, with proper security and governance structure in place has become mission-critical.
Attend this session to learn:
- Learn how you can meet cloud and data science challenges with data virtualization.
- Why data virtualization is increasingly finding enterprise-wide adoption
- Discover how customers are reducing costs and improving ROI with data virtualization
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
PLNOG 17 - Shabbir Ahmad - Dell Open Networking i Big Monitoring Fabric: unik...PROIDEA
Unikalne rozwiązanie do efektywnego monitoring ruchu w sieci ! Każdy Kliency posiadający sieć zmaga się z wyzwaniami jakie niosą ze sobą próba efektywnego monitoring ruchu. W trakcie sesji zostanie zaprezentowane w praktyce (demo) niezwykle skalowane, łatwe w implementacji i obsłudze oraz bardzo efektywne kosztow rozwiązanie do monitoringu ruchu w sieci oparte o przełączniki Dell Open Networking oraz oprogramowanie sieciowe BigSwitch Big Monitoring Fabric. Jest to praktyczna implementacja sieci SDN (Software Defined Networking) !
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
Watch full webinar here: https://bit.ly/3cUA0Qi
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Customer value analysis of big data productsVikas Sardana
Business value analysis through Customer Value Model for software technology choices with a case study from Mobile Advertising industry for Big Data use case.
Hadoop’s capabilities offer untapped potential for business insights but companies often get weighed down with DIY platforms and fail to keep up with the requirements. Join this Dell EMC session which will address this challenge with ready bundles to quickly deliver solutions for ETL offload, Single View, & IoT.
Get more value from your big data:
• Deploy big data applications faster
• Increase business agility
• Confidently deliver high performance and endless scale
• Improve IT operational efficiency
Speaker
Shawn Smith, Big Data Specialist, Dell EMC
Enabling the Software Defined Data Center for Hybrid ITNetApp
Recently, NetApp held a Cloud Breakfast for customers of our High Touch Customer Program. This was a combined presentation from OBS, VMware and NetApp.
Presenters:
Jim Sangster, Senior Director, Solutions Marketing, NetApp - "Cloud for the Hybrid Data Center"
John Gilmartin, Vice President, Cloud Infrastructure Products, VMware - "Next Generation of IT"
Axel Haentjens Vice President, Marketing and International Orange Cloud for Business "NetApp Epic Story OBS"
Tim Waldron, Manager, Cloud Solutions, NetApp EMEA "Cloud Services – An EMEA Perspective"
MapR Technologies Chief Marketing Officer, Jack Norris, talks about the advantages of Hadoop. He elaborates and multiple use cases and explains how MapR Technologies is the best Hadoop distribution.
Cloud Computing for Small & Medium BusinessesAl Sabawi
I presented this topic at the Greater Binghamton Business Expo in Upstate New York. It is meant to shed light on utilizing Cloud Computing for Small and Medium size businesses. It should help decision makers consider Software-as-a-Service offerings for their business as a way to save on IT cost and to deliver on better efficiency for their organizations.
Introduction: This workshop will provide a hands-on introduction to Machine Learning (ML) with an overview of Deep Learning (DL).
Format: An introductory lecture on several supervised and unsupervised ML techniques followed by light introduction to DL and short discussion what is current state-of-the-art. Several python code samples using the scikit-learn library will be introduced that users will be able to run in the Cloudera Data Science Workbench (CDSW).
Objective: To provide a quick and short hands-on introduction to ML with python’s scikit-learn library. The environment in CDSW is interactive and the step-by-step guide will walk you through setting up your environment, to exploring datasets, training and evaluating models on popular datasets. By the end of the crash course, attendees will have a high-level understanding of popular ML algorithms and the current state of DL, what problems they can solve, and walk away with basic hands-on experience training and evaluating ML models.
Prerequisites: For the hands-on portion, registrants must bring a laptop with a Chrome or Firefox web browser. These labs will be done in the cloud, no installation needed. Everyone will be able to register and start using CDSW after the introductory lecture concludes (about 1hr in). Basic knowledge of python highly recommended.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Whilst HBase is the most logical answer for use cases requiring random, realtime read/write access to Big Data, it may not be so trivial to design applications that make most of its use, neither the most simple to operate. As it depends/integrates with other components from Hadoop ecosystem (Zookeeper, HDFS, Spark, Hive, etc) or external systems ( Kerberos, LDAP), and its distributed nature requires a "Swiss clockwork" infrastructure, many variables are to be considered when observing anomalies or even outages. Adding to the equation there's also the fact that HBase is still an evolving product, with different release versions being used currently, some of those can carry genuine software bugs. On this presentation, we'll go through the most common HBase issues faced by different organisations, describing identified cause and resolution action over my last 5 years supporting HBase to our heterogeneous customer base.
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
LocationTech GeoMesa enables spatial and spatiotemporal indexing and queries for HBase and Accumulo. In this talk, after an overview of GeoMesa’s capabilities in the Cloudera ecosystem, we will dive into how GeoMesa leverages Accumulo’s Iterator interface and HBase’s Filter and Coprocessor interfaces. The goal will be to discuss both what spatial operations can be pushed down into the distributed database and also how the GeoMesa codebase is organized to allow for consistent use across the two database systems.
OCLC has been using HBase since 2012 to enable single-search-box access to over a billion items from your library and the world’s library collection. This talk will provide an overview of how HBase is structured to provide this information and some of the challenges they have encountered to scale to support the world catalog and how they have overcome them.
Many individuals/organizations have a desire to utilize NoSQL technology, but often lack an understanding of how the underlying functional bits can be utilized to enable their use case. This situation can result in drastic increases in the desire to put the SQL back in NoSQL.
Since the initial commit, Apache Accumulo has provided a number of examples to help jumpstart comprehension of how some of these bits function as well as potentially help tease out an understanding of how they might be applied to a NoSQL friendly use case. One very relatable example demonstrates how Accumulo could be used to emulate a filesystem (dirlist).
In this session we will walk through the dirlist implementation. Attendees should come away with an understanding of the supporting table designs, a simple text search supporting a single wildcard (on file/directory names), and how the dirlist elements work together to accomplish its feature set. Attendees should (hopefully) also come away with a justification for sometimes keeping the SQL out of NoSQL.
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
Data serves as the platform for decision-making at Uber. To facilitate data driven decisions, many datasets at Uber are ingested in a Hadoop Data Lake and exposed to querying via Hive. Analytical queries joining various datasets are run to better understand business data at Uber.
Data ingestion, at its most basic form, is about organizing data to balance efficient reading and writing of newer data. Data organization for efficient reading involves factoring in query patterns to partition data to ensure read amplification is low. Data organization for efficient writing involves factoring the nature of input data - whether it is append only or updatable.
At Uber we ingest terabytes of many critical tables such as trips that are updatable. These tables are fundamental part of Uber's data-driven solutions, and act as the source-of-truth for all the analytical use-cases across the entire company. Datasets such as trips constantly receive updates to the data apart from inserts. To ingest such datasets we need a critical component that is responsible for bookkeeping information of the data layout, and annotates each incoming change with the location in HDFS where this data should be written. This component is called as Global Indexing. Without this component, all records get treated as inserts and get re-written to HDFS instead of being updated. This leads to duplication of data, breaking data correctness and user queries. This component is key to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. This component will need to have strong consistency and provide large throughputs for index writes and reads.
At Uber, we have chosen HBase to be the backing store for the Global Indexing component and is a critical component in allowing us to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. In this talk, we will discuss data@Uber and expound more on why we built the global index using Apache Hbase and how this helps to scale out our cluster usage. We’ll give details on why we chose HBase over other storage systems, how and why we came up with a creative solution to automatically load Hfiles directly to the backend circumventing the normal write path when bootstrapping our ingestion tables to avoid QPS constraints, as well as other learnings we had bringing this system up in production at the scale of data that Uber encounters daily.
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
Recently, Apache Phoenix has been integrated with Apache (incubator) Omid transaction processing service, to provide ultra-high system throughput with ultra-low latency overhead. Phoenix has been shown to scale beyond 0.5M transactions per second with sub-5ms latency for short transactions on industry-standard hardware. On the other hand, Omid has been extended to support secondary indexes, multi-snapshot SQL queries, and massive-write transactions.
These innovative features make Phoenix an excellent choice for translytics applications, which allow converged transaction processing and analytics. We share the story of building the next-gen data tier for advertising platforms at Verizon Media that exploits Phoenix and Omid to support multi-feed real-time ingestion and AI pipelines in one place, and discuss the lessons learned.
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
Cybersecurity requires an organization to collect data, analyze it, and alert on cyber anomalies in near real-time. This is a challenging endeavor when considering the variety of data sources which need to be collected and analyzed. Everything from application logs, network events, authentications systems, IOT devices, business events, cloud service logs, and more need to be taken into consideration. In addition, multiple data formats need to be transformed and conformed to be understood by both humans and ML/AI algorithms.
To solve this problem, the Aetna Global Security team developed the Unified Data Platform based on Apache NiFi, which allows them to remain agile and adapt to new security threats and the onboarding of new technologies in the Aetna environment. The platform currently has over 60 different data flows with 95% doing real-time ETL and handles over 20 billion events per day. In this session learn from Aetna’s experience building an edge to AI high-speed data pipeline with Apache NiFi.
In the healthcare sector, data security, governance, and quality are crucial for maintaining patient privacy and ensuring the highest standards of care. At Florida Blue, the leading health insurer of Florida serving over five million members, there is a multifaceted network of care providers, business users, sales agents, and other divisions relying on the same datasets to derive critical information for multiple applications across the enterprise. However, maintaining consistent data governance and security for protected health information and other extended data attributes has always been a complex challenge that did not easily accommodate the wide range of needs for Florida Blue’s many business units. Using Apache Ranger, we developed a federated Identity & Access Management (IAM) approach that allows each tenant to have their own IAM mechanism. All user groups and roles are propagated across the federation in order to determine users’ data entitlement and access authorization; this applies to all stages of the system, from the broadest tenant levels down to specific data rows and columns. We also enabled audit attributes to ensure data quality by documenting data sources, reasons for data collection, date and time of data collection, and more. In this discussion, we will outline our implementation approach, review the results, and highlight our “lessons learned.”
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.
Extending Twitter's Data Platform to Google CloudDataWorks Summit
Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
Companies are increasingly moving to the cloud to store and process data. One of the challenges companies have is in securing data across hybrid environments with easy way to centrally manage policies. In this session, we will talk through how companies can use Apache Ranger to protect access to data both in on-premise as well as in cloud environments. We will go into details into the challenges of hybrid environment and how Ranger can solve it. We will also talk through how companies can further enhance the security by leveraging Ranger to anonymize or tokenize data while moving into the cloud and de-anonymize dynamically using Apache Hive, Apache Spark or when accessing data from cloud storage systems. We will also deep dive into the Ranger’s integration with AWS S3, AWS Redshift and other cloud native systems. We will wrap it up with an end to end demo showing how policies can be created in Ranger and used to manage access to data in different systems, anonymize or de-anonymize data and track where data is flowing.
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.
Background: Some early applications of Computer Vision in Retail arose from e-commerce use cases - but increasingly, it is being used in physical stores in a variety of new and exciting ways, such as:
● Optimizing merchandising execution, in-stocks and sell-thru
● Enhancing operational efficiencies, enable real-time customer engagement
● Enhancing loss prevention capabilities, response time
● Creating frictionless experiences for shoppers
Abstract: This talk will cover the use of Computer Vision in Retail, the implications to the broader Consumer Goods industry and share business drivers, use cases and benefits that are unfolding as an integral component in the remaking of an age-old industry.
We will also take a ‘peek under the hood’ of Computer Vision and Deep Learning, sharing technology design principles and skill set profiles to consider before starting your CV journey.
Deep learning has matured considerably in the past few years to produce human or superhuman abilities in a variety of computer vision paradigms. We will discuss ways to recognize these paradigms in retail settings, collect and organize data to create actionable outcomes with the new insights and applications that deep learning enables.
We will cover the basics of object detection, then move into the advanced processing of images describing the possible ways that a retail store of the near future could operate. Identifying various storefront situations by having a deep learning system attached to a camera stream. Such things as; identifying item stocks on shelves, a shelf in need of organization, or perhaps a wandering customer in need of assistance.
We will also cover how to use a computer vision system to automatically track customer purchases to enable a streamlined checkout process, and how deep learning can power plausible wardrobe suggestions based on what a customer is currently wearing or purchasing.
Finally, we will cover the various technologies that are powering these applications today. Deep learning tools for research and development. Production tools to distribute that intelligence to an entire inventory of all the cameras situation around a retail location. Tools for exploring and understanding the new data streams produced by the computer vision systems.
By the end of this talk, attendees should understand the impact Computer Vision and Deep Learning are having in the Consumer Goods industry, key use cases, techniques and key considerations leaders are exploring and implementing today.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
2. Hadoop Makes Shareholders Happy
The World's Largest Telcos are Driving Business
Performance with Hadoop at the Center of an
Enterprise-Wide Modern Data Architecture
Juergen Urbanski
CEO, Tech Alpha
Board Member Big Data & Analytics, BITKOM (German IT Industry Association)
3. Agenda
• Telco Data Management Challenges
• Hadoop Business Value
• Data Lake Business Value
• Data Lake Reference Architecture
• 21 Telco Use Cases for Hadoop
– Network Infrastructure
– Service and Security
– Sales and Marketing
– New and Adjacent Business
3
4. Enterprise Data Management Challenges
Limited Insight:
• Schema On Write
• Data In Silos
Limited Scale:
• Not Designed to Scale
• Not Affordable at
ScalePhysical
Infrastructure
Presentation &
Application
Data
Access
Data
Management
Engineered
Systems
Shared Storage
Systems
OLTP OLAPTraditional
Analytics
=
=
– 4 –
5. Business Value of Hadoop
Data Access Layer
Data Management Layer
Hadoop Core Capabilities:
Broader Insights:
• Allows simultaneous access by and
timely insights for all your users across
all your data
• Irrespective of the processing engine,
analytical application or presentation
• Enabled by schema on read and
enterprise-wide pool of data
Unlimited Scale:
• Allows to acquire all data in its original
format and store it in one place, cost
effectively and for an unlimited time
• Affordable and performing well into the
100+ petabyte scale
=
=
– 5 –
6. A New Approach for Broader Insights
HADOOP
Iterate over structure
Transform and analyze
Hadoop Approach
• Apply schema on read
• Support range of access patterns to
data stored in HDFS: polymorphic
access
Batch Interactive Real-time
Right Engine, Right Job
In-memory
Traditional Approach
• Apply schema on write
• Heavily dependent on IT
Determine list of questions
Design solution
Collect structured data
Ask questions from list
Detect additional questions
Single Query Engine
SQL
– 6 –
7. Compelling Economics Allow Scale
0 5 10 15 20 25 30 35 40
SAN
EDW / MPP
Engineered System*
NAS
HADOOP
Cloud Storage
Min
Max
Fully Loaded Cost per Raw TB Deployed
US$ ‘000s
Hadoop Provides
Highly Scalable Data
Storage at 5% of the
Cost of Alternatives
36 to 180
20 to 80
12 to 18
10 to 20
0.250 to 1
0.1 to 0.3
* E.g., Oracle Exadata
– 7 –
8. 5 Capabilities of Hadoop 2.x Enable the Data Lake
– 8 –
Data
Integration &
Governance
Integrate with
existing
systems.
Move data into,
within and out of
the environment
Security
Provide layered
approach to
security
Operations
Deploy and
manage a
multi-tenant,
environment
easily, using
existing tools
where possible
Environment and
Deployment Model
Run anywhere
Data Lake Functional Requirements
1 32
4
Data Access = Insight
…ask questions later (or in the
moment)
Data Management = Scale
Store first…
Presentation & Application
Enable existing and new
applications
5
9. Data Lake Reference Architecture
– 9 –
Deployment
Model
Environment
Data
Integration &
Governance
Data
Access
Security Operations
Data
Management
Storage: HDFS
(Hadoop Distributed File System)
Multitenant Processing: YARN
(Hadoop Operating System)
Online
HBase
Accumulo
Real-
Time
Storm
Others
Commodity HW
Linux Windows
Appliance
On Premise Virtualize
Cloud/Hosted
Authentication
Authorization
Accountability
Data Protection
across
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
Provision,
Manage &
Monitor
Ambari
Scheduling
Oozie
Data Workflow
Data Lifecycle
Falcon
Real-time and
Batch Ingest
Flume
Sqoop
WebHDFS
NFS
Batch
Map
Reduce
Script
Pig
SQL
Hive
In-
memory
Spark
Metadata Management
HCatalog
Presentation & Application
10. Multiple Use Cases and Tools Run on Hadoop as a
Shared Service
– 10 –
Hadoop 2.x:
Shared Service = Data Lake
Hadoop 1.x:
Dedicated Project Silos = Data Ponds
BU2 BU3BU1
Customer
Intimacy
Hbase
Opera-
tional
Excellence
Lucene
New
Business
Storm
Risk
Manage-
ment
Map-
Reduce
BU4
Customer
Intimacy
Hbase
Opera-
tional
Excellence
Lucene
New
Business
Storm
Risk
Manage-
ment
Map-
Reduce
Enterprise-wide
• Poor resource management
• Limited governance
• Batch processing, no streams
11. Shared service operational benefits similar to infrastructure cloud
Speed of provisioning and de-provisioning for capacity and users
Fast learning curve and reduced operational complexity
Consistent enforcement of data security, privacy and governance
Optimal capital efficiency driven by scale and load balancing
Value grows exponentially as data from more applications lands in
one Hadoop 2.x data lake
Marginal cost of retaining data is less than marginal value
Able to run a broader range of analyses
More data in one place usually leads to better answers
Results is order-of-magnitude better insights
Data Lake Business Rationale
– 11 –
12. Technical and Business Drivers
– 12 –
Foundation for a
modern data
architecture
New data types
Sensors
Machine
Generated
Geolocation
Documents, Em
ail,
Voice to Text
Social Networks
Web Logs,
Click Streams
Operational
excellence
E.g., Network
Maintenance
Compliance &
Risk Mgt.
E.g., Fraud
Reduction
Customer
Intimacy
E.g., 360
o
View of
Customer
New Business
E.g., Data as a
Product
Business drivers
13. Network capacity planning
Network upgrades
Network maintenance
Network performance management
Network traffic shaping
21 Telco Use Cases for Hadoop
– 13 –
Use Case
Network Infrastructure
Function
Customer experience analytics
Contact center productivity
Field service productivity
Data protection and compliance
End-user device security
Service and Security
360-degree view of customer value
Personalized marketing campaigns
Upselling and cross-selling
Next-product-to-buy (NPTB)
Churn reduction
Sales and Marketing
New product development
Actionable intelligence serving:
Advertisers
Merchants/retailers
Payment processors
Federal governments
Local governments
New and Adjacent Business
Network
Care
Sales
New Biz
14. Hadoop in Network Infrastructure
– 14 –
Business Problem
Network capacity planning
Network upgrades
Network maintenance
Network performance management
Network traffic shaping
Hadoop is used to optimize the rollout of
4G coverage in time and space to
match the likely pick-up in service
revenue, allowing an operator to defer
more than 10% of capex for the same
resulting revenue.
Hadoop helped detect that only a small
number of congested cable network
nodes were responsible for the majority
of churn, and could thus be prioritized
for maintenance and upgrades.
Network function virtualization, software
defined networking and unified all IP
networks vastly increase the amount of
machine and log data relevant for
trouble shooting. Hadoop helps with
root cause analysis and may even be
used to reason on the data in real-time.
Value Realized
Network
Care
Sales
New Biz
15. Network Infrastructure –
Network Capacity Planning
– 15 –
Business Problem
The consumption of services and
resulting bandwidth in a particular
neighborhood may be out of sync with a
telco’s plans to build new towers or
transmission lines in that same
neighborhood.
This leads to a mismatch between
expensive infrastructure investments
and the actual revenue from those
investments.
Examples:
4G (LTE)
FTTC (fiber to the curb)
FTTH (fiber to the home)
One European carrier used Hadoop to
optimize the rollout of 4G coverage in
time and space to match the likely pick-
up in service revenue, based on detailed
cell tower traffic data of the last few
years.
With their prior, less informed
approach, they would have had to
spend 10% more capex for the same
outcome.
Value Realized
Network
Care
Sales
New Biz
16. Network Infrastructure –
Network Upgrades
– 16 –
Business Problem
Hadoop is used for targeted network
maintenance and upgrades by cable
companies.
One large US cable MSO was unsure
how cable network congestion affects
churn, and where exactly network
upgrades produce the most incremental
revenue.
The result was that only a small number
of nodes were responsible for the
majority of the negative customer
experience, and could therefore be
prioritized for upgrades.
Value Realized
Network
Care
Sales
New Biz
17. Hadoop in Network Infrastructure –
Network Upgrades Improve the Customer
Experience
– 17 –
• Correlate
network
congestion and
customer
experience
• 11 different data
sources
• 4m subscriber
records, 12m
work orders, 9m
calls, 42m
IPDRs, 20m
Tivoli NPMs
• Finding: Only a
few nodes
responsible for
most of the
negative
customer
experience
Network
Node
TNMP
CMTS
Performance
Network
Sensors
IPDR
Cable
Modem
Usage
Competitive
Spend
Data
HouseholdHousehold
Master
Subscriber
Record
Marketing
Demo-
graphics
Caller
Experience
Work Orders
Mobile
Devices
Customer
Premise
Equipment
Online
Transactions
Social Media
Interactions
SOURCE DATA
Network
Care
Sales
New Biz
18. Network Infrastructure –
Network Maintenance
– 18 –
Business Problem
Radio access networks provide the air
interface between a mobile provider and
the end user mobile devices.
Maintenance and repair of radio access
networks poses substantial logistical
challenges. In most countries, mobile
networks cover more than 95% of a
country’s surface area.
Many transmission towers are in remote
and difficult to access locations.
In high-density areas, pico- and femto-
cells optimize local coverage, but in turn
require coordination with the building
owner for maintenance.
Hadoop improves a provider’s ability to
service equipment proactively, which is
always cheaper and less disruptive than
the replacement of equipment that has
already failed.
Value Realized
Network
Care
Sales
New Biz
19. Network Infrastructure –
Network Performance Management
– 19 –
Business Problem
Existing network management platform
meant to diagnose poor cellular service
such as dropped calls or poor audio
quality.
Overwhelmed by data volume, ingesting
10 million messages per second
Each analysis was limited to a 24-hour
time window and only one-fiftieth the
surface area of the United States.
Same customer issue may generate
multiple support calls, but the operator’s
team cannot see relationships between
multiple variables across time.
Is the problem with the customer’s
device? Is it their neighborhood or
proximity to a tower? Is it because of
how they use their phone?
With more history, they are able to
explore root causes that they have
never been able to identify by reviewing
just one day’s data, allowing them to to
improve cell phone service.
Value Realized
Network
Care
Sales
New Biz
20. Hadoop in Service and Security
– 21 –
Business Problem
Customer experience analytics based
on call detail records (CDRs)
Contact center productivity
Field service productivity
Data protection and compliance
End-user device security
With Hadoop, one operator detected
that 25% of callers were contacting the
call center merely to have their late fees
on the monthly bill waived. Clearly a
case for call deflection to interactive
voice recognition and online self-
service.
Contact center agents had insufficient
ways of diagnosing what was wrong
with customers, leading to many
unnecessary truck rolls. Hadoop
helped avoid these.
3% of smartphones account for 10-15%
of traffic because of malware (notably
on Android phones) and some fair use
violations. Hadoop helps detect that so
operators can take remedial action.
Value Realized
Network
Care
Sales
New Biz
21. Service and Security –
Customer Experience Analytics Based on
Call Detail Records (CDRs)
– 22 –
Business Problem
A typical mobile service provider
generates >1 billion CDRs per
day, ingesting millions of CDRs per
second.
System holds >100 billion records, half
a petabyte added every month!
Due to the cost of existing solutions, the
data expires after 60 days
CDRs need to be analyzed and archived
for compliance, billing and congestion
monitoring.
Example: forensics on dropped calls
and poor sound quality.
High volume makes pattern recognition
and root cause analysis difficult.
Often those need to happen in real-
time, with a customer waiting for
answers.
With Hadoop the carrier can to retain
some data for up to three years
Hadoop provides both a cost advantage
– Hadoop provides storage 20x cheaper
than enterprise-grade storage – and
better insights.
Better analysis to continuously improve
call quality, customer satisfaction and
servicing margins.
Value Realized
Network
Care
Sales
New Biz
22. Service and Security –
Contact Center Productivity
– 23 –
Business Problem
A US-based mobile provider struggled
with a combination of high costs but low
customer satisfaction related to
customer care.
An increasing share of support cases
are related to mobile data usage and
associated charges.
Traditionally, contact center agents did
not have granular insights into a
particular customer’s data usage, hence
were unable to provide effective call
resolution.
With Hadoop, one operator detected
that 25% of callers were contacting the
call center merely to have their late fees
on the monthly bill waived.
The provider was able to off-load these
cases to online self-service and
interactive voice recognition.
Frees up the agents to focus on more
valuable customer interactions.
The provider is now extending this
solution to focus on issue resolution.
Value Realized
Network
Care
Sales
New Biz
23. Service and Security –
Field Service Productivity
– 24 –
Business Problem
A provider’s contact center agents had
insufficient ways of diagnosing what was
wrong with customers, leading to many
unnecessary truck rolls.
In particular, the agents were not able to
triage network vs. home-based
problems accurately enough.
Therefore, technicians were dispatched
to the customer premises for problems
that reside within the network.
The provider was able to avoid a large
number of “false positive” truck rolls.
With each truck roll costing about $150
fully loaded, the provider was able to
save several million dollars already in
the first year.
Value Realized
Network
Care
Sales
New Biz
24. Service and Security –
End User Device Security
– 26 –
Business Problem
A mobile operator needed to identify
real-time malware threats from non-
trusted application stores and contain
their impact on customers.
3% of smartphones account for 10-15%
of traffic because of malware (notably
on Android phones) and some fair use
violations.
Hadoop helps detect that so operators
can take remedial action, thus
eliminating a disproportionate share of
network tonnage.
Options ranged from notifying an
affected customer all the way to
blocking certain URLs for the whole
network.
Value Realized
Network
Care
Sales
New Biz
25. Hadoop in Sales and Marketing
– 27 –
Business Problem
360-degree view of customer value
Personalized marketing campaigns
Upselling and cross-selling
Next-product-to-buy (NPTB)
Churn reduction
Telesales revenue increase by 50% by
tracking competitors web-sites visited
and counter offers to products searched
+20% conversion rate increase by
optimizing and personalizing the path-
to-transaction
$1.65 ARPU increase for 1 million
customers boosts topline by $20 million
per year.
Reducing cable subscriber churn (“cord
cutting”). Every 100,000 subscribers
equates to customer lifetime value of
$1 billion
Churn model quality increase
Price related churn down by 40%
Value Realized
Network
Care
Sales
New Biz
26. Sales and Marketing –
360 Degree View of Customer Value
– 28 –
Business Problem
Telcos and cable companies interact
with customers across many channels
and points in time.
Data about those interactions is stored
in silos.
Difficult to correlate data about
customer purchases, marketing
campaign results, and online browsing
behavior.
Problem is exacerbated by recent
acquisitions and a proliferation in the
volume and type of customer data.
Merging that data in a relational
database structure is slow, expensive
and technically difficult.
Enterprise-wide data lake of several
petabytes
360-degree unified view of the customer
(or household) life time value based on
usages patterns across time, products
and channels.
Value Realized
Network
Care
Sales
New Biz
27. Sales and Marketing –
Personalized Marketing Campaigns
– 29 –
Business Problem
Marketers have long sought ways to
tailor their marketing campaigns to the
needs of each individual customer.
Telcos are uniquely positioned to deliver
on that goal because mobile phones not
only follow their owners everywhere, but
also reveal a lot about their owners’
interests through browsing behavior and
the applications present on the phone.
Telcos are looking for ways to mine that
information.
Provider risked losing substantial
revenue as prepaid customers were
starting to switch to a competitor as a
result of a particularly effective
marketing campaign.
The provider used Hadoop to pinpoint
those individual customers most at risk
of churning, and then built a highly
targeted campaign to retain the
remaining customers in that segment.
A churn alarm system was established
and revenue leakage was minimized.
Telesales revenue increase by 50% by
tracking competitors web-sites visited
and counter offers to products searched
+20% conversion rate increase by
optimizing and personalizing the path-
to-transaction
$1.65 ARPU increase for 1 million
customers boosts topline by $20 million
per year.
Value Realized
Network
Care
Sales
New Biz
28. Sales and Marketing –
Up-selling and Cross-selling
– 30 –
Business Problem
The provider needed to find an
approach to upsell smart phones into a
user base that was still largely on legacy
feature phones.
The operator converted many hundred
thousand feature phone users to smart
phones with associated data plans.
Value Realized
Network
Care
Sales
New Biz
29. Sales and Marketing –
Next Product to Buy (NPTB)
– 31 –
Business Problem
As telco product portfolios grow more
complex, there are ever more
opportunities to sell additional services
to the same customer base.
Many sales reps however are
overwhelmed with that complexity and
struggle to translate the breadth of the
product portfolio into incremental sales.
Confident NPTB
recommendations, based on data from
all its customers, empower sales
associates and improve their
interactions with customers pre-
transaction.
Value Realized
Network
Care
Sales
New Biz
30. Sales and Marketing –
Churn Reduction
– 32 –
Business Problem
A North American provider faced the
following challenge: 50% of new
customers churned off within 6 months
of acquisition.
The average customer life time in this
segment was 13 months, well short of
the 18 months needed to break even.
The provider increased the “right”
customer acquisitions by 27% and
decreased subsequent churn in this
segment by 50%.
Price related churn down by 40%
Reducing cable subscriber churn (“cord
cutting”). Every 100,000 subscribers
equates to customer lifetime value of
$1 billion
Value Realized
Network
Care
Sales
New Biz
31. Hadoop in New and Over-the-Top / Adjacent
Businesses
– 33 –
Business Problem
New product development
Actionable intelligence serving:
Advertisers
Merchants/retailers
Payment processors
Federal governments
Local governments
Hadoop-as-a-Service
Telcos are well positioned to provide big
data as a service to retail, hospitality
and logistics customers. This can
generate $50-100m in annual revenue
for each medium-sized country.
Value Realized
Network
Care
Sales
New Biz
32. New and Adjacent Businesses –
New Product Development
– 34 –
Business Problem
Mobile devices produce large amounts
of data about where, when, how and
why they are used.
This data is extremely valuable for
product managers, yet much of it is out
of reach. Either it is never captured or
never converted into business insight.
Its volume and variety make it difficult to
ingest, store and analyze at scale.
One provider who logged 27m devices
with more than 1bn events per month
has developed more than 20 projects
and pilots within 18 months after
launch, leading to increased revenue
and profitability.
Value Realized
Network
Care
Sales
New Biz
33. New and Adjacent Businesses –
Actionable Intelligence Serving Advertisers
– 35 –
Business Problem
Europe’s leading real estate
marketplace Scout24 – a subsidiary of
Deutsche Telekom – features more than
one million properties for rent or sale at
any given time, and has facilitated more
than 20 million property transactions
over the last few years.
The company wanted to drive more
market share to Scout24 by offering
advertisers – typically real estate agents
and brokers – an even better service.
A small team consisting of a product
manager, a data scientist and a few
developers was able to make a
meaningful contribution to revenue
growth.
Value Realized
Network
Care
Sales
New Biz
34. Big Data as a Product:
ImmobilienScout (Deutsche Telekom)
– 36 –
Network
Care
Sales
New Biz
35. New and Adjacent Businesses –
Actionable Intelligence Serving Merchants
– 37 –
Business Problem
A French mobile service provider is a
great example for how location
information per customer segments can
be used to optimize promotions and
point-of-sale locations of bricks-and-
mortar retailers.
The retailers were able to increase their
reported same-store-sales through
better campaign management and in-
store optimizations. They also gained
valuable insights to optimize their store
network.
Value Realized
Network
Care
Sales
New Biz
36. New and Adjacent Businesses –
Actionable Intelligence Serving
Payment Processors
– 38 –
Business Problem
Credit card issuers experience
increasing fraud when their card
members are travelling abroad.
95% of travelers opted into the SMS
alerting service, resulting in a
substantial decrease in fraud related to
card use in foreign countries.
Value Realized
Network
Care
Sales
New Biz
37. New and Adjacent Businesses –
Actionable Intelligence Serving
Federal Governments
– 39 –
Business Problem
The Eastward expansion of the
European Union has resulted in a longer
and more porous border to non-EU
member states.
This has made it more difficult to protect
the EU against a stream of illegal goods
and refugees, which often travel over
land from the EU’s Eastern and South-
Eastern neighbors.
Law enforcement agencies are able to
target their scarce resources much
more effectively, for instance choosing
to intercept suspicious cars traveling in
certain directions at speeds above
130km/h.
This radically increases their hit rate per
mission.
Value Realized
Network
Care
Sales
New Biz
38. New and Adjacent Businesses –
Actionable Intelligence Serving
Local Governments
– 40 –
Business Problem
In a large French city, traffic to large
events regularly caused massive
congestion on the city’s streets and
highways.
The city identified and implemented
dozens of specific traffic management
measures, relieving congestion around
major events.
They are also exploring how to use
these insights for environmental impact
studies, city planning and disaster
management.
Value Realized
Network
Care
Sales
New Biz
39. • Makes capital investments more efficient
• Leads to a better customer experience
• Lowers churn
• Increases conversions
• Strengthens security
• Opens up new markets
Hadoop Drives Business Outcomes for the
World’s Telcos and Cable Companies!
– 41 –
40. Questions?
Email juergen@techalpha.com for a copy of the presentation.
LinkedIn: juergenurbanski
Download 200-page BITKOM / Forrester Guide to Big Data
Technologies (in German):
http://www.bitkom.org/files/documents/BITKOM_Leitfaden_Big-Data-
Technologien-Wissen_fuer_Entscheider_Febr_2014.pdf
Network infrastructureService and securitySales and marketingNew and adjacent business
1) Proactive customer care and fault resolution:3 call attempts to same number within several secondsTop 5% customers with highest call drop ratio; targeted according to error type
CDR analytics & archiving for compliance, billing & congestion monitoringContact center log analyticsFraud reduction for pre-paid mobile servicesSecurity analytics
Cross-channel 360 degree view of the customerPersonalized marketing campaigns, notably for upselling & cross-sellingNext best offer predictive recommendations at the point of saleSocial network and deep packet inspection analysisWeb site optimizationAdvanced pricing, with segmentation based on Price seekers, “In danger”, “Up-sell”Deep packet analytics for churn prevention and counter-offersSales completion actions:Empty basket re-targetingCheck-out completionSearch completion (iPhone 4s)2) Web page personalization:Offerings based on previous exp.Offerings based on affinity3) Path-to-transaction optimization:1) Churn prevention:Tracking competitors web-sites visitedRe-calculating propensities to churn2) Counter offerings on products searched via Google:USB-modemsBranded handsetsDSL connectionsFlat voice & data tariff