Whether you’ve heard of Google’s MapReduce or not, its impact on Big Data applications, data warehousing, ETL,
business intelligence, and data mining is re-shaping the market for business analytics and data processing.
Attend this session to hear from Curt Monash on the basics of the MapReduce framework, how it is used, and what implementations like SQL-MapReduce enable.
In this session you will learn:
* The basics of MapReduce, key use cases, and what SQL-MapReduce adds
* Which industries and applications are heavily using MapReduce
* Recommendations for integrating MapReduce in your own BI, Data Warehousing environment
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
"Industrializing Machine Learning – How to Integrate ML in Existing Businesses", Erik Schmiegelow, CEO at Hivemind Technologies AG
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk.
In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
"Industrializing Machine Learning – How to Integrate ML in Existing Businesses", Erik Schmiegelow, CEO at Hivemind Technologies AG
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk.
In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
Big data expert and Infochimps CEO, Jim Kaskade presents the Infinite Monkey Theorem at CloudCon Expo. He provides an energetic, inspiring, and practical perspective on why Big Data is disrupting. It’s more than historic data analyzed on Hadoop. It’s also more than real-time streaming data stored and queried using NoSQL. Learn more at www.Infochimps.com
Big Data kennen sehr viele IT-Experten, wenigstens haben Sie eine Vorstellung davon. In der Praxis arbeiten damit in Deutschland derzeit nur wenige. Dabei bringt Big Data ein ganz neues Momentum in moderne Softwarelösungen und ist im Kontext der Mobil-, Cloud- und Social-Veränderungen nicht wegzudenken. Big Data macht Software intelligent und damit auf eine ganz neue Art für die Benutzer erlebbar. Mit Big Data entstehen neue Softwarearchitekturen, weil Informationen völlig anders verarbeitet werden - nämlich schneller, differenzierter und oft mit dem Ziel, Schlüsse zu ziehen und Vorhersagen zu treffen.
In diesem Vortrag wird erläutert, wie moderne Softwarearchitekturen gestaltet werden, sodass Sie Big Data Paradigmen erfolgreich umsetzen und welche Vorteile sich für die zunehmend mobilen Softwarelösungen ergeben. Wir werfen zudem einen Blick auf die Potentiale und Optionen in Branchen wie Banken, Versicherung oder Handel.
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
Building Your Own Facebook Real Time Analytics System with Cassandra and GigaSpaces.
Facebook's real time analytics system is a good reference for those looking to build their real time analytics system for big data.
The first part covers the lessons from Facebook's experience and the reason they chose HBase over Cassandra.
In the second part of the session, we learn how we can build our own Real Time Analytics system, achieve better performance, gain real business insights, and business analytics on our big data, and make the deployment and scaling significantly simpler using the new version of Cassandra and GigaSpaces Cloudify.
As the Big Data market has evolved, the focus has shifted from data operations (storage, access and processing of data) to data science (understanding, analyzing and forecasting from data). And as new models are developed, organizations need a process for deploying analytics from research into the production environment. In this talk, we'll describe the five stages of real-time analytics deployment:
Data distillation
Model development
Model validation and deployment
Model refresh
Real-time model scoring
We'll review the technologies supporting each stage, and how Revolution Analytics software works with the entire analytics stack to bring Big Data analytics to real-time production environments.
Watch this recorded webcast and listen to Infochimps CSO and Co-Founder, Dhruv Bansal, and Think Big Analytics Principal Architect, Douglas Moore, share successful use cases and recommendations for building real-time predictive analytics in your enterprise.
SQL Azure Database is a cloud database service from Microsoft. SQL Azure provides web-facing database functionality as a utility service. Cloud-based database solutions such as SQL Azure can provide many benefits, including rapid provisioning, cost-effective scalability, high availability, and reduced management overhead. This paper provides an overview on some scale out strategies, challenges with scaling out on-premise and how you can benefit with scaling out with SQL Azure.
Protecting data privacy in analytics and machine learning ISACA London UKUlf Mattsson
ISACA London Chapter webinar, Feb 16th 2021
Topic: “Protecting Data Privacy in Analytics and Machine Learning”
Abstract:
In this session, we will discuss a range of new emerging technologies for privacy and confidentiality in machine learning and data analytics. We will discuss how to put these technologies to work for databases and other data sources.
When we think about developing AI responsibly, there’s many different activities that we need to think about.
This session also discusses international standards and emerging privacy-enhanced computation techniques, secure multiparty computation, zero trust, cloud and trusted execution environments. We will discuss the “why, what, and how” of techniques for privacy preserving computing.
We will review how different industries are taking opportunity of these privacy preserving techniques. A retail company used secure multi-party computation to be able to respect user privacy and specific regulations and allow the retailer to gain insights while protecting the organization’s IP. Secure data-sharing is used by a healthcare organization to protect the privacy of individuals and they also store and search on encrypted medical data in cloud.
We will also review the benefits of secure data-sharing for financial institutions including a large bank that wanted to broaden access to its data lake without compromising data privacy but preserving the data’s analytical quality for machine learning purposes.
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Con LA
Abstract:- Telecommunications service providers (or telcos) have access to massive amounts of historical and streaming data about subscribers. However, it often takes them a long time to build, operationalize and gain value from various machine learning and analytic models. This is true even for relatively common use-cases like churn prediction, purchase propensity, next topup or purchase prediction, subscriber profiling, customer experience modeling, recommendation engines and fraud detection. In this talk, I shall describe our approach to tackling this problem, which involved having a pre-packaged set of analytic pipelines on a scalable Big Data architecture that work on several standard and well known telco data formats and sources, and that we were able to reuse across several different telcos. This allows the telcos to deploy the analytic pipelines on their data, out of the box, and go live in a matter of weeks, as opposed to the several months it used to take if they started from scratch. In the talk, I shall describe our experiences in deploying the pre-packaged analytic pipelines with several telcos in North America, South East Asia and the Middle East. The pipelines work on a variety of historical and streaming data, including call data records having voice, SMS and data usage information, purchase and recharge behavior, location information, browsing/clickstream data, billing and payment information, smartphone device logs, etc. The pipelines run on a combination of Spark and Unscrambl BRAINTM, which includes a real-time machine learning framework, a scalable profile store based on Redis and an aggregation engine that stores efficient summaries of time-series data. I shall describe some of the machine learning models that get trained and scored as part of these pipelines. I shall also remark on how reusable certain models are across different telcos, and how a similar set of features can be used for models like next topup or purchase prediction, churn prediction and purchase propensity across similar telcos in different geographies.
As the adoption of AI technologies increases and matures, the focus will shift from exploration to time to market, productivity and integration with existing workflows. Governing Enterprise data, scaling AI model development, selecting a complete, collaborative hybrid platform and tools for rapid solution deployments are key focus areas for growing data scientist teams tasked to respond to business challenges. This talk will cover the challenges and innovations for AI at scale for the Enterprise focusing on the modernization of data analytics, the AI ladder and AI life cycle and infrastructure architecture considerations. We will conclude by viewing the benefits and innovation of running your modern AI and Data Analytics applications such as SAS Viya and SAP HANA on IBM Power Systems and IBM Storage in hybrid cloud environments.
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthHostedbyConfluent
"For data-driven enterprises, the most important objective is unlocking the value of their data. To enable this, data scientists are increasingly turning towards data discovery tools (also known as data catalogs) that can help them locate the right dataset or insight and use it correctly. But are all data catalogs the same?
In this talk, I describe how a stream-first architecture was a critical design element that benefited the implementation of our data catalog. We follow the evolution of LinkedIn DataHub’s architecture over the past few years from a simple search tool to a streaming metadata platform that drives productivity and governance workflows across the company.
Join this talk to learn:
* How different data discovery / catalog tools are architected and the tradeoffs in each kind of architecture
* How streaming architectures can benefit metadata
* How event-driven metadata architectures can supercharge your data productivity and governance workflows at your company"
Big data is one of the most popular terms in the IT industry during the past decade. The word is vague and broad enough that essentially every one of us is living in a big-data world. Every time you do a google search, like a post in Facebook, write something in WeChat or view some item on Amazon, you both use and contribute to someone's big data system. Managing so much data across many computers introduce unique challenges. In this talk, we review the landscape of big data platforms and discuss some lessons we learned from building them.
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
The presentation has many use cases covering the following Image classification: "The process of identifying and detecting an object or a feature in a digital image or video," the report states. In retail, deep learning models "quickly scan and analyze in-store imagery to intuitively determine inventory movement."
Voice recognition: "The ability to receive and interpret dictation or to understand and carry out spoken commands. Models are able to convert captured voice commands to text and then use natural language processing to understand what is being said and in what context." In transportation, deep learning "uses voice commands to enable drivers to make phone calls and adjust internal controls - all without taking their hands off the steering wheel."
Anomaly detection: "Deep learning technique strives to recognize abnormal patterns which don't match the behaviors expected for a particular system, out of millions of different transactions. These applications can lead to the discovery of an attack on financial networks, fraud detection in insurance filings or credit card purchases, even isolating sensor data in industrial facilities signifying a safety issue."
Recommendation engines: "Analyze user actions in order to provide recommendations based on user behavior."
Sentiment analysis: "Leverages deep learning-heavy techniques such as natural language processing, text analysis, and computational linguistics to gain clear insight into customer opinion, understanding of consumer sentiment, and measuring the impact of marketing strategies."
Video analysis: "Process and evaluate vast streams of video footage for a range of tasks including threat detection, which can be used in airport security, banks, and sporting events."
Digital Shift in Insurance: How is the Industry Responding with the Influx of...DataWorks Summit
The digital connected world is having an impact on the technology environments that insurers must create to thrive in the new era of computing. The nature of customer interactions, business processes from product, risk and claims management are continuously changing. During this session we will review recent research and insights from insurance companies in the life, general and reinsurance markets and discuss the implications for insurers as the industry considers implications from core systems, predictive and preventive analytics and improvements to customer experiences.
Millions of dollars are being spent annually by the insurance industry in InsurTech investments from risk listening, customer interactions (chatbots, SMS messaging, smart interactive conversations), to methods of evaluating claims (digital capture at notice of incident, dashcams, connected homes/vehicles).
These are all new types of data which the industry hasn't previously had to manage and govern.
Additionally, at the heart of this is how to create new business opportunities from data. We will also have an interactive conversation on discussing and exploring insurance implications of the new computing environment from AI, Big Data and IoT (Edge computing).
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology and Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. The combination of the two can provide a solution to power advanced analytics for not only what has happened in the past, but make intelligent predictions about the future. Please join this webinar to learn how get the most value from your data for your data driven business.
Learning Objectives:
How to scale your Redshift queries with user-defined functions (UDFs)
How to apply Machine learning to historical data in Amazon Redshift
How to visualize your data with Amazon QuickSight
Present a reference architecture for advanced analytics
Who Should Attend:
Application developers looking to add UDFs, or predictive analytics to their applications, database administrators that need to meet the demand of data driven organizations, decision makers looking to derive more insight from their data
Big data expert and Infochimps CEO, Jim Kaskade presents the Infinite Monkey Theorem at CloudCon Expo. He provides an energetic, inspiring, and practical perspective on why Big Data is disrupting. It’s more than historic data analyzed on Hadoop. It’s also more than real-time streaming data stored and queried using NoSQL. Learn more at www.Infochimps.com
Big Data kennen sehr viele IT-Experten, wenigstens haben Sie eine Vorstellung davon. In der Praxis arbeiten damit in Deutschland derzeit nur wenige. Dabei bringt Big Data ein ganz neues Momentum in moderne Softwarelösungen und ist im Kontext der Mobil-, Cloud- und Social-Veränderungen nicht wegzudenken. Big Data macht Software intelligent und damit auf eine ganz neue Art für die Benutzer erlebbar. Mit Big Data entstehen neue Softwarearchitekturen, weil Informationen völlig anders verarbeitet werden - nämlich schneller, differenzierter und oft mit dem Ziel, Schlüsse zu ziehen und Vorhersagen zu treffen.
In diesem Vortrag wird erläutert, wie moderne Softwarearchitekturen gestaltet werden, sodass Sie Big Data Paradigmen erfolgreich umsetzen und welche Vorteile sich für die zunehmend mobilen Softwarelösungen ergeben. Wir werfen zudem einen Blick auf die Potentiale und Optionen in Branchen wie Banken, Versicherung oder Handel.
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
Building Your Own Facebook Real Time Analytics System with Cassandra and GigaSpaces.
Facebook's real time analytics system is a good reference for those looking to build their real time analytics system for big data.
The first part covers the lessons from Facebook's experience and the reason they chose HBase over Cassandra.
In the second part of the session, we learn how we can build our own Real Time Analytics system, achieve better performance, gain real business insights, and business analytics on our big data, and make the deployment and scaling significantly simpler using the new version of Cassandra and GigaSpaces Cloudify.
As the Big Data market has evolved, the focus has shifted from data operations (storage, access and processing of data) to data science (understanding, analyzing and forecasting from data). And as new models are developed, organizations need a process for deploying analytics from research into the production environment. In this talk, we'll describe the five stages of real-time analytics deployment:
Data distillation
Model development
Model validation and deployment
Model refresh
Real-time model scoring
We'll review the technologies supporting each stage, and how Revolution Analytics software works with the entire analytics stack to bring Big Data analytics to real-time production environments.
Watch this recorded webcast and listen to Infochimps CSO and Co-Founder, Dhruv Bansal, and Think Big Analytics Principal Architect, Douglas Moore, share successful use cases and recommendations for building real-time predictive analytics in your enterprise.
SQL Azure Database is a cloud database service from Microsoft. SQL Azure provides web-facing database functionality as a utility service. Cloud-based database solutions such as SQL Azure can provide many benefits, including rapid provisioning, cost-effective scalability, high availability, and reduced management overhead. This paper provides an overview on some scale out strategies, challenges with scaling out on-premise and how you can benefit with scaling out with SQL Azure.
Protecting data privacy in analytics and machine learning ISACA London UKUlf Mattsson
ISACA London Chapter webinar, Feb 16th 2021
Topic: “Protecting Data Privacy in Analytics and Machine Learning”
Abstract:
In this session, we will discuss a range of new emerging technologies for privacy and confidentiality in machine learning and data analytics. We will discuss how to put these technologies to work for databases and other data sources.
When we think about developing AI responsibly, there’s many different activities that we need to think about.
This session also discusses international standards and emerging privacy-enhanced computation techniques, secure multiparty computation, zero trust, cloud and trusted execution environments. We will discuss the “why, what, and how” of techniques for privacy preserving computing.
We will review how different industries are taking opportunity of these privacy preserving techniques. A retail company used secure multi-party computation to be able to respect user privacy and specific regulations and allow the retailer to gain insights while protecting the organization’s IP. Secure data-sharing is used by a healthcare organization to protect the privacy of individuals and they also store and search on encrypted medical data in cloud.
We will also review the benefits of secure data-sharing for financial institutions including a large bank that wanted to broaden access to its data lake without compromising data privacy but preserving the data’s analytical quality for machine learning purposes.
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Con LA
Abstract:- Telecommunications service providers (or telcos) have access to massive amounts of historical and streaming data about subscribers. However, it often takes them a long time to build, operationalize and gain value from various machine learning and analytic models. This is true even for relatively common use-cases like churn prediction, purchase propensity, next topup or purchase prediction, subscriber profiling, customer experience modeling, recommendation engines and fraud detection. In this talk, I shall describe our approach to tackling this problem, which involved having a pre-packaged set of analytic pipelines on a scalable Big Data architecture that work on several standard and well known telco data formats and sources, and that we were able to reuse across several different telcos. This allows the telcos to deploy the analytic pipelines on their data, out of the box, and go live in a matter of weeks, as opposed to the several months it used to take if they started from scratch. In the talk, I shall describe our experiences in deploying the pre-packaged analytic pipelines with several telcos in North America, South East Asia and the Middle East. The pipelines work on a variety of historical and streaming data, including call data records having voice, SMS and data usage information, purchase and recharge behavior, location information, browsing/clickstream data, billing and payment information, smartphone device logs, etc. The pipelines run on a combination of Spark and Unscrambl BRAINTM, which includes a real-time machine learning framework, a scalable profile store based on Redis and an aggregation engine that stores efficient summaries of time-series data. I shall describe some of the machine learning models that get trained and scored as part of these pipelines. I shall also remark on how reusable certain models are across different telcos, and how a similar set of features can be used for models like next topup or purchase prediction, churn prediction and purchase propensity across similar telcos in different geographies.
As the adoption of AI technologies increases and matures, the focus will shift from exploration to time to market, productivity and integration with existing workflows. Governing Enterprise data, scaling AI model development, selecting a complete, collaborative hybrid platform and tools for rapid solution deployments are key focus areas for growing data scientist teams tasked to respond to business challenges. This talk will cover the challenges and innovations for AI at scale for the Enterprise focusing on the modernization of data analytics, the AI ladder and AI life cycle and infrastructure architecture considerations. We will conclude by viewing the benefits and innovation of running your modern AI and Data Analytics applications such as SAS Viya and SAP HANA on IBM Power Systems and IBM Storage in hybrid cloud environments.
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthHostedbyConfluent
"For data-driven enterprises, the most important objective is unlocking the value of their data. To enable this, data scientists are increasingly turning towards data discovery tools (also known as data catalogs) that can help them locate the right dataset or insight and use it correctly. But are all data catalogs the same?
In this talk, I describe how a stream-first architecture was a critical design element that benefited the implementation of our data catalog. We follow the evolution of LinkedIn DataHub’s architecture over the past few years from a simple search tool to a streaming metadata platform that drives productivity and governance workflows across the company.
Join this talk to learn:
* How different data discovery / catalog tools are architected and the tradeoffs in each kind of architecture
* How streaming architectures can benefit metadata
* How event-driven metadata architectures can supercharge your data productivity and governance workflows at your company"
Big data is one of the most popular terms in the IT industry during the past decade. The word is vague and broad enough that essentially every one of us is living in a big-data world. Every time you do a google search, like a post in Facebook, write something in WeChat or view some item on Amazon, you both use and contribute to someone's big data system. Managing so much data across many computers introduce unique challenges. In this talk, we review the landscape of big data platforms and discuss some lessons we learned from building them.
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
The presentation has many use cases covering the following Image classification: "The process of identifying and detecting an object or a feature in a digital image or video," the report states. In retail, deep learning models "quickly scan and analyze in-store imagery to intuitively determine inventory movement."
Voice recognition: "The ability to receive and interpret dictation or to understand and carry out spoken commands. Models are able to convert captured voice commands to text and then use natural language processing to understand what is being said and in what context." In transportation, deep learning "uses voice commands to enable drivers to make phone calls and adjust internal controls - all without taking their hands off the steering wheel."
Anomaly detection: "Deep learning technique strives to recognize abnormal patterns which don't match the behaviors expected for a particular system, out of millions of different transactions. These applications can lead to the discovery of an attack on financial networks, fraud detection in insurance filings or credit card purchases, even isolating sensor data in industrial facilities signifying a safety issue."
Recommendation engines: "Analyze user actions in order to provide recommendations based on user behavior."
Sentiment analysis: "Leverages deep learning-heavy techniques such as natural language processing, text analysis, and computational linguistics to gain clear insight into customer opinion, understanding of consumer sentiment, and measuring the impact of marketing strategies."
Video analysis: "Process and evaluate vast streams of video footage for a range of tasks including threat detection, which can be used in airport security, banks, and sporting events."
Digital Shift in Insurance: How is the Industry Responding with the Influx of...DataWorks Summit
The digital connected world is having an impact on the technology environments that insurers must create to thrive in the new era of computing. The nature of customer interactions, business processes from product, risk and claims management are continuously changing. During this session we will review recent research and insights from insurance companies in the life, general and reinsurance markets and discuss the implications for insurers as the industry considers implications from core systems, predictive and preventive analytics and improvements to customer experiences.
Millions of dollars are being spent annually by the insurance industry in InsurTech investments from risk listening, customer interactions (chatbots, SMS messaging, smart interactive conversations), to methods of evaluating claims (digital capture at notice of incident, dashcams, connected homes/vehicles).
These are all new types of data which the industry hasn't previously had to manage and govern.
Additionally, at the heart of this is how to create new business opportunities from data. We will also have an interactive conversation on discussing and exploring insurance implications of the new computing environment from AI, Big Data and IoT (Edge computing).
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology and Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. The combination of the two can provide a solution to power advanced analytics for not only what has happened in the past, but make intelligent predictions about the future. Please join this webinar to learn how get the most value from your data for your data driven business.
Learning Objectives:
How to scale your Redshift queries with user-defined functions (UDFs)
How to apply Machine learning to historical data in Amazon Redshift
How to visualize your data with Amazon QuickSight
Present a reference architecture for advanced analytics
Who Should Attend:
Application developers looking to add UDFs, or predictive analytics to their applications, database administrators that need to meet the demand of data driven organizations, decision makers looking to derive more insight from their data
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
Key Value and Column Stores are not the only two data models Scylla is capable of. In this presentation learn the What, Why and How of building and deploying a graph data system in the cloud, backed by the power of Scylla.
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
Matei Zaharia is an assistant professor of computer science at Stanford University, Chief Technologist and Co-founder of Databricks. He started the Spark project at UC Berkeley and continues to serve as its vice president at Apache. Matei also co-started the Apache Mesos project and is a committer on Apache Hadoop. Matei’s research work on datacenter systems was recognized through two Best Paper awards and the 2014 ACM Doctoral Dissertation Award.
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
Many data pipelines share common characteristics and are often built in similar but bespoke ways, even within a single organisation. In this talk, we will outline the key considerations which need to be applied when building data pipelines, such as performance, idempotency, reproducibility, and tackling the small file problem. We’ll work towards describing a common Data Engineering toolkit which separates these concerns from business logic code, allowing non-Data-Engineers (e.g. Business Analysts and Data Scientists) to define data pipelines without worrying about the nitty-gritty production considerations.
We’ll then introduce an implementation of such a toolkit in the form of Waimak, our open-source library for Apache Spark (https://github.com/CoxAutomotiveDataSolutions/waimak), which has massively shortened our route from prototype to production. Finally, we’ll define new approaches and best practices about what we believe is the most overlooked aspect of Data Engineering: deploying data pipelines.
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...Amazon Web Services
Customers are adopting Apache Spark ‒ an open-source distributed processing framework ‒ on Amazon EMR for large-scale machine learning workloads, especially for applications that power customer segmentation and content recommendation. By leveraging Spark ML, a set of machine learning algorithms included with Spark, customers can quickly build and execute massively parallel machine learning jobs. Additionally, Spark applications can train models in streaming or batch contexts, and can access data from Amazon S3, Amazon Kinesis, Amazon Redshift, and other services. This session explains how to quickly and easily create scalable Spark clusters with Amazon EMR, build and share models using Apache Zeppelin and Jupyter notebooks, and use the Spark ML pipelines API to manage your training workflow. In addition, Jasjeet Thind, Senior Director of Data Science and Engineering at Zillow Group, will discuss his organization's development of personalization algorithms and platforms at scale using Spark on Amazon EMR.
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraDataStax Academy
A few years ago, processing large volumes of data was an exclusive problem of big companies. Nowadays, technological advancement allows people to be connected with each other all the time, generating and consuming large amounts of data.
In the challenge to follow Movile's exponential growth and increasing volume of information, we soon realized that traditional relational database and data analysis solutions were no longer a good fit to solve new order issues. Therefore, we present Movile's 'Change Of Seasons', a use case on adopting Apache Cassandra as a solution for critical high-performance distributed systems.
Cassandra Summit 2015 - A Change of SeasonsEiti Kimura
A CHANGE OF SEASONS: A big move to Apache Cassandra!
This is an extended version of the material presented at Cassandra Summit 2015 - Santa Clara - California - USA.
In this presentation I will show you 3 moves, use cases, that constitute our Big Move to Apache Cassandra @Movile.
Walking through relational model to NoSQL solution, hybrid platforms and a staggering cost reduction and throughput increase.
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
SpringOne Platform 2017
Milind Bhandarkar, Ampool
"To provide hyper-personalized digital experiences in the emerging market transformation, innovative enterprises are building modern data-driven applications to deliver continuing value to their always-connected customers. Such applications need to utilize closed-loop deep insights to influence their users' behaviors in real-time. However, the traditional ways of capturing users' interactions, transporting data to large data warehouses or data lakes, further away from applications, and processing these data across multiple slow stages cannot meet the real-time expectations of both customers and businesses.
What if one could capture, analyze, and serve data from a highly concurrent, high-performance data store powering these applications? In this talk, we'll present a memory-centric Active Data Store (ADS), powered by Apache Geode, to meet the exigent demands of modern applications while providing operational simplicity. Ampool's ADS allows fast ingest and storage of 'hot' app data, in situ updates and analysis, and data serving from the same scalable distributed in-memory data store. As the data cools (ages), Ampool ADS automatically tiers data to warm and cold secondary stores. By speeding analytics several-fold, Ampool enables feeding actionable insights back to applications, driving decisions in a closed loop.
We will demonstrate the applicability of Ampool ADS for such an app by serving all data-access patterns from a single memory-centric store."
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingTeradata Aster
Matt Comstock, Vice President Business Intelligence Office, Razorfish, presents at the Big Analytics 2012 Roadshow.
From search to email to social, customers are interacting with your brand across a variety of channels. But what do people do once they view an advertisement or get an email? What common behaviors are displayed once they’re on your site? By combining media exposure/behavior, site-side media, and in-store purchase data, you can understand better the impact media has on driving value to your business. Come to this session to learn how better data-driven multi-channel analysis lets you see what consumers do before they become a customer to understand what content influences which segments of users by media audience. Discover new segmentation and targeting strategies to improve engagement with your brand and increase advertising lift. See how a leader in digital marketing uses a combination of technologies including Teradata Aster, Hadoop, and Amazon Web Services to handle big data and provide big analytics to improve business value.
Gayatri Patel, eBay, presents at the Big Analytics 2012 Roadshow
The wonders of what data can do for an organization is measured in the productivity and competitiveness of their team's decisions. Some believe more data is the key. Agreed...but good decisions require more than just deriving intelligence from big data. In this dynamic market, the need to socialize and evolve ideas with other teams, quickly correlate information across sources, and test ideas to fail fast early are strong enablers to gain competitive footing. eBay¹s analytic and technology advancements garners insights and approaches that continue to help our employees tell their "data stories" and make better decisions.
Using Data to Manage in Today’s Chaotic EnvironmentTeradata Aster
DJ Patil, Data Scientist, Greylock Partners, presents at the Big Analytics 2012 Roadshow in San Francisco.
The ability to manage and leverage data has never been more critical to business. At the same time, the volume and types of data have grown dramatically in the past few years. The choices for technologies, people, and processes are complex. In this keynote session, Dr.DJ Patil will talk about how to manage through all this chaos.
This data is from registrants for the Big Analytics 2012 events. The survey asked participants to classify themselves as “business” or “IT”.
Survey details:
Number of survey respondents and date -
San Francisco (April) = 507
Boston (May)= 322
Chicago (June) = 441
New York (Dec) = 894
TOTAL = 2164
Bill Franks, Chief Analytics Officer, Teradata, presents at the 2012 Big Analytics Roadshow.
As enterprises come to understand the value of analytics, more support and funding is being allocated to build these departments. Managers are now faced with the challenge of who to hire. What exactly makes a great analytic professional? Is a Data Scientist a "must have"? Should a candidate have a PhD? Is prior experience in a specific industry vital? Just what is the right fit when creating a successful team? The answers to these questions are still unclear as the value of analytics continues to grow.
In this session, Bill Franks, author of the book, Taming the Big Data Tidal Wave, addresses these and many more questions as he defines the characteristics of high-performing data scientists and great analytics teams.
Practical Applications of Visual AnalyticsTeradata Aster
Dustin Smith, Community Manager, Tableau Software, presents at the 2012 Big Analytics Roadshow.
Organizations now have the ability to store and process massive amounts of data like never before. And there are huge expectations for turning data into a fundamental driver for business transformation and competitive advantage.
Visual analytics is helping everyday employees gain insight into data in order to solve unexpected problems and challenges, it is changing the way people interact with data and the way business intelligence is defined in organizations. In this presentation, we will share real-world examples of how everyday people can and are using visual analytics to solve some of businesses most challenging issues.
Trust and Influence in the Complex Network of Social MediaTeradata Aster
William Rand, University of Maryland, presents at the 2012 Big Analytics Roadshow.
The dramatic feature of social media is that it gives everyone a voice; anyone can speak out and express their opinion to a crowd of followers with little or no cost or effort, which creates a loud and potentially overwhelming marketplace of ideas. The good news is that the organizations have more data than ever about what their consumers are saying about their brand. The bad news is that this huge amount of data is difficult to sift through. We will look at developing methods that can help sift through this torrent of data and examine important questions, such as who do users trust to provide them with the information and the recommendations that they want? Which tastemakers have the greatest influence on social media users? Using agent-based modeling, machine learning and network analysis we begin to examine and shed light on these questions and develop a deeper understanding of the complex system of social media.
Mohanbir Sawhney, Robert R. McCormick Tribune Foundation Clinical Professor of Technology Kellogg School of Management, Northwestern University presents at the 2012 Big Analytics Roadshow.
Companies are drinking from a fire hydrant of data that is too big, moving too fast and is too diverse to be analyzed by conventional database systems. Big Data is like a giant gold mine with large quantities of ore that is difficult to extract. To get value out of Big Data, enterprises need a new mindset and a new set of tools. They also need to know how to extract actionable insights from Big Data that can lead to competitive advantage. The Big Story of Big Data is not what Big Data is, but what it means for business value and competitive advantage.... read more: http://www.biganalytics2012.com/sessions.html#mohan_sawhney
Big Brands Meet Big Data – The Newest Innovator’s DilemmaTeradata Aster
Marc Parrish, Vice President, Retention & Loyalty Marketing, Barnes & Noble, presents at the 2012 Big Analytics Roadshow.
Big Data is moving too fast for Big Brands. They don't have the ability to technically pivot, to move quickly enough to take advantage of the astounding amount of customer information that's available, and make it part of their everyday practices. This poses a great risk to the world’s great retailers. Well-managed companies often fail because the very same management practices that made them industry leaders also make it difficult to assimilate the disruptive technologies that in the end allow others to steal their market.
With Big Data, the gap between merely sustaining your operations, and adopting disruptive technologies, is the difference between progress, or perish.
Simplifying Big Data Analytics for the BusinessTeradata Aster
Tasso Argyros, Co-Founder & Co-President, Teradata Aster presents at the 2012 Big Analytics Roadshow.
The opportunity exists for organizations in every industry to unlock the power of iterative, big data analysis with new applications such as digital marketing optimization and social network analysis to improve their bottom line. Big data analysis is not just the ability to analyze large volumes of data, but the ability to analyze more varieties of data by performing more complex analysis than is possible with more traditional technologies. This session will demonstrate how to bring the science of data to the art of business by empowering more business users and analysts with operationalized insights that drive results. See how data science is making emerging analytic technologies more accessible to businesses while providing better manageability to enterprise architects across retail, financial services, and media companies.
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
Mike Gualtieri, Principal Analyst, Forrester Research, presents at the Big Analytics Roadshow, 2012 in New York City on December 12, 2012
Presentation title: Evaluating Big Data Predictive Analytics Platforms
Abstract: Great. You have Big Data. Now what? You have to analyze it to find game-changing predictive models that you can use to make smart decisions, reduce risk, or deliver breakthrough customer experiences. Big Data Predictive Analytics solutions are software and/or hardware solutions that allow firms to discover, evaluate, optimize, and deploy predictive models by analyzing big data sources. In this session, Forrester Principal Analyst Mike Gualtieri will discuss the key criteria you should use to evaluate Big Data Predictive Analytics platforms to meet your specific needs.
Keynote: Cross Industry Lessons from Moneyball AnalyticsTeradata Aster
Ari Kaplan keynote presentation at the Big Analytics Roadshow, 2012 in New York City on December 12, 2012
Presentation title: "Cross Industry Lessons from Moneyball Analytics", by Ari Kaplan, "Moneyball" advisor to Major League Baseball teams and President of AriBall
Ari Kaplan is a leading figure in sports analytics. Known throughout the Major Leagues for revolutionizing and modernizing player assessment, Ari's use of analytics and technology helps coaches prepare for games, players understand their strengths and weaknesses, General Managers forecast future performance and risk of player contracts and draft picks, and more.
In this presentation, Kaplan discusses how professional sports teams and players use analytics and data visualization in the Major Leagues. Through his 23 years of experience in over half of all MLB organizations, he will discuss the changes that took place and where analytics will continue to innovate in the future.
Technology Strategies for Big Data Analytics, Teradata Aster
SAS Presentation delivered at the Big Analytics Roadshow, 2012 in New York City on December 12, 2012
Presentation title: Technology Strategies for Big Data Analytics, by Bernard Blais, Global Strategist and Principal Manager, SAS
The exploding volume, complexity and velocity of big data present an increasing challenge to organizations, but also a significant opportunity to derive valuable insights. As organizations are tasked with managing massive data sets, it’s clear that the value of big data will be derived from the analytics that can be performed on it. Analytics is the key to identifying patterns, managing risks and tackling previously unsolvable problems. This presentation provides an overview of how to comprehensively tackle big data, including emerging strategies for information management, analytics, and high performance analytics.
Using SQL-MapReduce for Advanced AnalyticsTeradata Aster
Industry analyst Rick van der Lans explains how Aster Data's patent-pending SQL-MapReduce programming framework makes new types of analytic queries possible. The main benefits he outlines are: Parallelization of complex operations; Simplification of queries; Predictabile query performance; Efficient data access; and Linear scalability.
Check out Rick's complimentary research report at http://www.asterdata.com/ar_SQL-MapReduce_for_Advanced_Analytics/, in which he provides a very clear technical explanation of SQL-MapReduce and its analytic application use cases.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Mastering MapReduce: MapReduce for Big Data Management and Analysis
1. Mastering MapReduce Series, Session I:MapReduce for Big Data Management and Analysis Curt Monash, Monash Research Steve Wooledge, Aster Data Peter Pawlowski, Aster Data Eric Friedman, Aster Data October 15th, 2009
2. Aster Data Overview SQL-MapReduce Example SQL-MapReduce applications SQL-MapReduce Syntax/example Q&A Topics
3. Aster Data Creating the Next-Generation Data Management System Founded in 2005 to revolutionize data processing & management of very large data volumes Founding team innovated on the ‘big data’ problem at Stanford University and were joined by big data experts from Google, Oracle, and Microsoft Aster’s first commercial product, nCluster, has been in market since 2007. Customers include MySpace, LinkedIn, Coremetrics, Akamai, others. Since 2008, innovated on Google’s well-known MapReduceframework to transform data processing. Created patent-pending SQL-MapReduce(In-Database MapReduce)
18. Aster’s Solution - A Massively Parallel Data Warehouse With the Unique Ability to Embed Applications Deeper, Faster Analytics on Big Data OtherApplications(C, C++, Perl, Python…) Leading BI Tools Key Classes ofApplications Custom JAVAApplications Custom .NET Applications Packaged Analytic Apps 6 Aster nCluster System Aster’s SQL-MapReduce orStandard Interfaces Unified Interface SQL SQL-MapReduce 5 High Volume, Fast Querying Industry-leading WLM: 300+ Concurrent Workloads 4 Dynamic Workload Manager (WLM) Data .NET App Java App Embedded Parallelized Apps – executes within the DB Pack’gdApp Other Apps 3 3 Data Data Data Data Data Data MPP Data Warehouse withIncremental Scaling (scale by function) Data Data Data Data Data 2 Massively -Parallel Data Store 1 Commodity Hardware
19. Aster SQL-MapReduce (SQL-MR) Bring your applications to the data “Data-Applications” Development Platform Rich portfolio of supported languages – Java, .NET, Python, Ruby, Perl, C++, R and More Use SQL to develop rich data apps Expressive flexibility Reusability across applications and reports
26. Aster’s Patent-Pending SQL-MapReduce Enables faster, easier, and more powerful analytics SQL-MapReduce framework (for developers to create and extend) Flexible: MapReduce expressiveness, languages, polymorphism Performance: Massive parallelization, computational push-down Availability: Fault isolation, resource management Powerful SQL-MR functions (for analysts to consume) Deep insights: Unlimited analytical power at your disposal Ease of use: Simply plug in to the SQL you know and love The Power of Aster’s SQL-MapReduce Framework Write Install Use and Reuse Write a SQL-MR function in Java, C, etc. Install inside Aster nCluster Invoke SQL-MR function from SQL 3 1 2
35. Expensive HW & maintenanceBest of both worlds! Traditional Database
36. MapReduce Applications Behavioral Analytics (CRM) Sequential pattern analysis (e.g., up-sell/cross-sell) Spam/BOT analysis Sessionization analysis Risk & Fraud analysis Consumer credit scoring/default risk, market risk/VaR, operational risk, etc Fraud detection Graph analysis Social network “connectedness” (e.g., SSSP, APSP, etc) Text analysis Tokenization (e.g., word count classification) Natural language processing Statistical analysis (machine learning) Linear regression K-means clustering R Project algorithms
37. Aster’s SQL-MapReduce Library: Pre-packaged (SDK), SQL-MR APIs, and documentation Pre-packaged SQL-MR sample functions nPath – complex sequential analysis for time-series and behavioral pattern analysis SSSP – single source shortest path Graph algorithm useful for fraud and segmentation analysis Sessionize– session categorization based on a sequence of clicks within a specified timeout Approximate percentiles – ultra-fast percentile (or N-tile) statistical distribution analysis Linear regression – statistical technique used to predict values based on a set of related variables. Tokenize – text analysis that splits strings into words, categorizes them, and does a word count
38.
39. Requires dozens of SQL queries every N minutes (dozens of times per day)
52. nPath is a SQL-MR function included with nCluster. nPath enables analysis of ordered data: Clickstream data Financial transaction data User interaction data Anything of a time series nature Leverages the power of the SQL-MR framework to transcend SQL’s limitations with respect to ordered data What is Aster nPath? 17
53. Example: Analyzing a Clickstream Business question How many distinct users: Start at the home page. Click on an auction. View the seller’s profile. Bid on the item. Available Data A database table clicks, populated with web log data, that has columns user_id, timestamp, and page_type.
54. The nPath query SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (1) Partition: Form groups by user_id. (2) Order: Sort each group by timestamp.
55. The nPath query (3b) Match: Define the subsequences of interest via regex. SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (3a) Match: Define a set of symbols.
56. The nPath query SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (4) Compute Aggregates over matched subsequences.
57. Market Basket Analysis Example Question Detect customers - that purchase the same category of items - in three market baskets in a row - with total value > $150
58. Two Methods – Same Answer Multi-pass Nested Sub-selects Single Pass SQL-MR nPath Query 5187 17769 3542 1889 5753 2001 156 193 2521 156 1416 75194 75194 10411 27355
61. Upcoming Webcast: Mastering MapReduce Part II Save the date!: December 3rd MapReduce Resources - http://www.asterdata.com/mapreduce/index.php Recorded application use-cases Code samples and tutorials DBMS2 on MapReduce: http://www.dbms2.com/category/parallelization/mapreduce/ Aster’s SQL-MapReduce http://www.asterdata.com/product/mapreduce.php http://www.asterdata.com/blog/index.php/category/mapreduce/ TDWI Technical whitepaper Contact us hello@asterdata.com Steve.wooledge@asterdata.com Thank You!