"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
"Industrializing Machine Learning – How to Integrate ML in Existing Businesses", Erik Schmiegelow, CEO at Hivemind Technologies AG
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk.
In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dataconomy Media
This document discusses data virtualization and how it can help organizations leverage data lakes to access all their data from disparate sources through a single interface. It addresses how data virtualization can help avoid data swamps, prevent physical data lakes from becoming silos, and support use cases like IoT, operational data stores, and offloading. The document outlines the benefits of a logical data lake created through data virtualization and provides examples of common use cases.
IBM provides two types of accelerators for big data to speed the development and implementation of specific big data solutions: 1) Analytic accelerators that address specific data types or operations with advanced analytics; and 2) Application accelerators that address specific use cases and include both industry-specific and cross-industry features. The accelerators are packaged software components that provide business logic, data processing, and visualization capabilities and help eliminate the complexity of building big data applications. Examples of capabilities provided by various accelerators include text analytics, geospatial analysis, time series prediction, data mining, finance analytics, machine data analysis, social media insights, and telecommunications event data processing.
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
The document discusses reference architectures for enterprise big data use cases. It begins by providing background on how databases have scaled over time and the evolution of large-scale data processing. It then discusses the basic idea behind big data use cases, which is to use all available data regardless of structure or source. The document outlines some key requirements like fault tolerance, dynamic scaling, and processing all data types. It proposes an architectural approach using NoSQL databases and cloud computing alongside traditional data warehousing. Finally, it shares two reference architectures - the current IBM approach and a transitional approach.
What is big data - Architectures and Practical Use CasesTony Pearson
1. Big data is the analysis of large volumes of diverse data to identify trends, patterns and insights to make better business decisions. It allows companies to cost efficiently process growing data volumes and collectively analyze the broadening variety of data.
2. The document discusses architectures and practical use cases of big data. It provides examples of how companies are using big data to optimize operations, innovate new products, and gain instant awareness of fraud and risk.
3. Realizing the opportunities of big data requires thinking beyond traditional data sources to include machine, transactional, social, and enterprise content data. It also requires multiple platform capabilities like Hadoop, data warehousing, and stream computing.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
The document outlines an agenda for a presentation on big data. It discusses key topics like the state of big data adoption, a holistic approach to big data, five high value use cases, technical components, and the future of big data and cloud. The presentation aims to provide an overview of big data and how organizations can take a comprehensive approach to leveraging their data assets.
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
"Industrializing Machine Learning – How to Integrate ML in Existing Businesses", Erik Schmiegelow, CEO at Hivemind Technologies AG
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk.
In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dataconomy Media
This document discusses data virtualization and how it can help organizations leverage data lakes to access all their data from disparate sources through a single interface. It addresses how data virtualization can help avoid data swamps, prevent physical data lakes from becoming silos, and support use cases like IoT, operational data stores, and offloading. The document outlines the benefits of a logical data lake created through data virtualization and provides examples of common use cases.
IBM provides two types of accelerators for big data to speed the development and implementation of specific big data solutions: 1) Analytic accelerators that address specific data types or operations with advanced analytics; and 2) Application accelerators that address specific use cases and include both industry-specific and cross-industry features. The accelerators are packaged software components that provide business logic, data processing, and visualization capabilities and help eliminate the complexity of building big data applications. Examples of capabilities provided by various accelerators include text analytics, geospatial analysis, time series prediction, data mining, finance analytics, machine data analysis, social media insights, and telecommunications event data processing.
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
The document discusses reference architectures for enterprise big data use cases. It begins by providing background on how databases have scaled over time and the evolution of large-scale data processing. It then discusses the basic idea behind big data use cases, which is to use all available data regardless of structure or source. The document outlines some key requirements like fault tolerance, dynamic scaling, and processing all data types. It proposes an architectural approach using NoSQL databases and cloud computing alongside traditional data warehousing. Finally, it shares two reference architectures - the current IBM approach and a transitional approach.
What is big data - Architectures and Practical Use CasesTony Pearson
1. Big data is the analysis of large volumes of diverse data to identify trends, patterns and insights to make better business decisions. It allows companies to cost efficiently process growing data volumes and collectively analyze the broadening variety of data.
2. The document discusses architectures and practical use cases of big data. It provides examples of how companies are using big data to optimize operations, innovate new products, and gain instant awareness of fraud and risk.
3. Realizing the opportunities of big data requires thinking beyond traditional data sources to include machine, transactional, social, and enterprise content data. It also requires multiple platform capabilities like Hadoop, data warehousing, and stream computing.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
The document outlines an agenda for a presentation on big data. It discusses key topics like the state of big data adoption, a holistic approach to big data, five high value use cases, technical components, and the future of big data and cloud. The presentation aims to provide an overview of big data and how organizations can take a comprehensive approach to leveraging their data assets.
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Con LA
Abstract:- Telecommunications service providers (or telcos) have access to massive amounts of historical and streaming data about subscribers. However, it often takes them a long time to build, operationalize and gain value from various machine learning and analytic models. This is true even for relatively common use-cases like churn prediction, purchase propensity, next topup or purchase prediction, subscriber profiling, customer experience modeling, recommendation engines and fraud detection. In this talk, I shall describe our approach to tackling this problem, which involved having a pre-packaged set of analytic pipelines on a scalable Big Data architecture that work on several standard and well known telco data formats and sources, and that we were able to reuse across several different telcos. This allows the telcos to deploy the analytic pipelines on their data, out of the box, and go live in a matter of weeks, as opposed to the several months it used to take if they started from scratch. In the talk, I shall describe our experiences in deploying the pre-packaged analytic pipelines with several telcos in North America, South East Asia and the Middle East. The pipelines work on a variety of historical and streaming data, including call data records having voice, SMS and data usage information, purchase and recharge behavior, location information, browsing/clickstream data, billing and payment information, smartphone device logs, etc. The pipelines run on a combination of Spark and Unscrambl BRAINTM, which includes a real-time machine learning framework, a scalable profile store based on Redis and an aggregation engine that stores efficient summaries of time-series data. I shall describe some of the machine learning models that get trained and scored as part of these pipelines. I shall also remark on how reusable certain models are across different telcos, and how a similar set of features can be used for models like next topup or purchase prediction, churn prediction and purchase propensity across similar telcos in different geographies.
The document discusses how modern software architectures can help tame big data. It introduces the speakers and provides an overview of WidasConcepts. The agenda includes a discussion of how big data can help businesses, an example of big data applied in the CarbookPlus platform, and new software architectures for big data. Real-time systems and architectures like lambda architecture are presented as ways to process big data at high velocity and volume. The conclusion emphasizes that big data improves business efficiency but requires tailored implementations and new skills.
Transforming GE Healthcare with Data Platform StrategyDatabricks
Data and Analytics is foundational to the success of GE Healthcare’s digital transformation and market competitiveness. This use case focuses on a heavy platform transformation that GE Healthcare drove in the last year to move from an On prem legacy data platforming strategy to a cloud native and completely services oriented strategy. This was a huge effort for an 18Bn company and executed in the middle of the pandemic. It enables GE Healthcare to leap frog in the enterprise data analytics strategy.
Big data expert and Infochimps CEO, Jim Kaskade presents the Infinite Monkey Theorem at CloudCon Expo. He provides an energetic, inspiring, and practical perspective on why Big Data is disrupting. It’s more than historic data analyzed on Hadoop. It’s also more than real-time streaming data stored and queried using NoSQL. Learn more at www.Infochimps.com
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
Whether you’ve heard of Google’s MapReduce or not, its impact on Big Data applications, data warehousing, ETL,
business intelligence, and data mining is re-shaping the market for business analytics and data processing.
Attend this session to hear from Curt Monash on the basics of the MapReduce framework, how it is used, and what implementations like SQL-MapReduce enable.
In this session you will learn:
* The basics of MapReduce, key use cases, and what SQL-MapReduce adds
* Which industries and applications are heavily using MapReduce
* Recommendations for integrating MapReduce in your own BI, Data Warehousing environment
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsDataWorks Summit
This presentation discusses forward-looking statements that are subject to risks and uncertainties. It addresses issues around who owns data, who has access to data, and what type of data analysis can be done. It provides details on government-to-government, bank-to-government, and regional data exchanges. It discusses Rante's divisions and approach to unique experiences. Rante aims to anticipate industry trends and push boundaries through research and technology innovations.
The document discusses how telecom companies are increasingly using Hadoop to manage and analyze large amounts of diverse data. It notes that 80% of telecom data will be stored on Hadoop platforms going forward. Hadoop provides more cost-effective storage and processing of data compared to traditional data warehouses. It allows telecom companies to gain more value from all their data by performing more flexible analyses and asking bigger questions of their data. The document outlines some of the key benefits of Hadoop architectures for telecom companies dealing with big data, including being able to retain more types of data for longer periods at a lower cost.
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
This document summarizes a webinar presented by Talend and Caserta Concepts on the big data ecosystem. The webinar discussed how Talend provides an open source integration platform that scales to handle large data volumes and complex processes. It also overviewed Caserta Concepts' expertise in data management, big data analytics, and industries like financial services. The webinar covered topics like traditional vs big data, Hadoop and NoSQL technologies, and common integration patterns between traditional data warehouses and big data platforms.
There are patterns for things such as domain-driven design, enterprise architectures, continuous delivery, microservices, and many others.
But where are the data science and data engineering patterns?
Sometimes, data engineering reminds me of cowboy coding - many workarounds, immature technologies and lack of market best practices.
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
This document discusses big data architectures using Splunk, Hadoop, and relational databases. It begins with an overview of Splunk's scalability and real-time analytics capabilities. It then discusses Hunk, an analytics platform for Hadoop that provides self-service analytics. The document also examines using structured data in Splunk and connecting to relational databases. A case study examines challenges with the open source Hadoop ecosystem. Finally, it outlines a real-world customer architecture that uses Splunk for machine data, Hadoop for storage, Hunk for analytics, and connects to relational databases.
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
The document discusses how an Enterprise Data Lake (EDL) provides a more effective solution for enterprise BI and analytics compared to traditional enterprise data warehouses (EDW). It argues that EDL allows enterprises to retain all datasets, service ad-hoc requests with no latency or development time, and offer a low-cost, low-maintenance solution that supports direct analytics and reporting on data stored in its native format. The document promotes EDL as a mainstream solution that should be part of every mid-sized and large enterprise's standard IT stack.
Modern Data Management for Federal ModernizationDenodo
Watch full webinar here: https://bit.ly/2QaVfE7
Faster, more agile data management is at the heart of government modernization. However, Traditional data delivery systems are limited in realizing a modernized and future-proof data architecture.
This webinar will address how data virtualization can modernize existing systems and enable new data strategies. Join this session to learn how government agencies can use data virtualization to:
- Enable governed, inter-agency data sharing
- Simplify data acquisition, search and tagging
- Streamline data delivery for transition to cloud, data science initiatives, and more
Big Data Commercialization and associated IoT Platform Implications by Ramnik...Data Con LA
The document discusses the evolution from M2M to the Internet of Things (IoT). It notes that the IoT landscape is driven by demand for automated turn-key solutions to reduce complexity, but that business models and value propositions are not well defined. It also discusses Verizon's role in connectivity and packaged solutions. It then provides more detail on Verizon's definition of IoT, the IoT data analytics market opportunity, and Verizon's Palomar and Sheriff analytics platforms and applications in healthcare fraud detection, automotive hacking monitoring, and voice network abuse detection.
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
BI architecture drivers have to change to satisfy new requirements in format, volume, latency, hosting, analysis, reporting, and visualization. In this presentation delivered at the 2014 SATURN conference, SoftServe`s Serhiy and Olha showcased a number of reference architectures that address these challenges and speed up the design and implementation process, making it more predictable and economical:
- Traditional architecture based on an RDMBS data warehouse but modernized with column-based storage to handle a high load and capacity
- NoSQL-based architectures that address Big Data batch and stream-based processing and use popular NoSQL and complex event-processing solutions
- Hybrid architecture that combines traditional and NoSQL approaches to achieve completeness that would not be possible with either alone
The architectures are accompanied by real-life projects and case studies that the presenters have performed for multiple companies, including Fortune 100 and start-ups.
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesSanjay Sharma
Financial institutions today are under intense pressure to provide more value add to the customers, reduce IT costs and also grow year to year. This challenge has been further complicated by huge amounts of data being generated as well as mandatory federal compliances in place.
Similarly, Manufacturing industry today also is facing the challenge to process huge amount of data in real time and predict failures as early as possible to reduce cost and increase production efficiency.
The session will cover some high level Big Data use cases applicable to financial and manufacturing domain and how big data technologies are being used successfully to solve these challenges using some examples in credit card/banking industry in financial domain and semi-conductor production in manufacturing domain.
3 Reasons Data Virtualization Matters in Your PortfolioDenodo
Watch the full session on-demand here: https://goo.gl/upxC5W
Real-Time Analytics for Big Data, Cloud & Self-Service BI
The world of data is only becoming distributed. Privacy, regulations, and the need for real-time decisions are challenging organizations’ legacy information strategy. This webinar will include an expert panel discussion on Logical Data Warehouse, Universal Semantic Layer, and Real-time Analytics by Paul Moxon (VP of Data Architectures), Pablo Alvarez (Director of Product Management), and Alberto Pan (CTO).
Attend and learn:
• The major challenges of legacy information strategies.
• How data virtualization can help you overcome these challenges.
• Strategies for enabling agile data management and analytics.
This document discusses big data, including the large amounts of data being collected daily, challenges with traditional DBMS solutions, the need for new approaches like Hadoop and Aster Data to handle large volumes of structured and unstructured data, techniques for analyzing big data, and case studies of companies like Mobclix and Yahoo using big data solutions.
Overview of analytics and big data in practiceVivek Murugesan
Intended to give an overview of analytics and big data in practice. With set of industry use cases from different domains. Would be useful for someone who is trying to understand Analytics and Big Data.
Carlos González, Hewlett Packard Enterprise, nos habla acerca en la implicación del mercado de Big Data en su negocio y el papel que una solución como Vertica juega en éste de la mano de Qlik.
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Con LA
Abstract:- Telecommunications service providers (or telcos) have access to massive amounts of historical and streaming data about subscribers. However, it often takes them a long time to build, operationalize and gain value from various machine learning and analytic models. This is true even for relatively common use-cases like churn prediction, purchase propensity, next topup or purchase prediction, subscriber profiling, customer experience modeling, recommendation engines and fraud detection. In this talk, I shall describe our approach to tackling this problem, which involved having a pre-packaged set of analytic pipelines on a scalable Big Data architecture that work on several standard and well known telco data formats and sources, and that we were able to reuse across several different telcos. This allows the telcos to deploy the analytic pipelines on their data, out of the box, and go live in a matter of weeks, as opposed to the several months it used to take if they started from scratch. In the talk, I shall describe our experiences in deploying the pre-packaged analytic pipelines with several telcos in North America, South East Asia and the Middle East. The pipelines work on a variety of historical and streaming data, including call data records having voice, SMS and data usage information, purchase and recharge behavior, location information, browsing/clickstream data, billing and payment information, smartphone device logs, etc. The pipelines run on a combination of Spark and Unscrambl BRAINTM, which includes a real-time machine learning framework, a scalable profile store based on Redis and an aggregation engine that stores efficient summaries of time-series data. I shall describe some of the machine learning models that get trained and scored as part of these pipelines. I shall also remark on how reusable certain models are across different telcos, and how a similar set of features can be used for models like next topup or purchase prediction, churn prediction and purchase propensity across similar telcos in different geographies.
The document discusses how modern software architectures can help tame big data. It introduces the speakers and provides an overview of WidasConcepts. The agenda includes a discussion of how big data can help businesses, an example of big data applied in the CarbookPlus platform, and new software architectures for big data. Real-time systems and architectures like lambda architecture are presented as ways to process big data at high velocity and volume. The conclusion emphasizes that big data improves business efficiency but requires tailored implementations and new skills.
Transforming GE Healthcare with Data Platform StrategyDatabricks
Data and Analytics is foundational to the success of GE Healthcare’s digital transformation and market competitiveness. This use case focuses on a heavy platform transformation that GE Healthcare drove in the last year to move from an On prem legacy data platforming strategy to a cloud native and completely services oriented strategy. This was a huge effort for an 18Bn company and executed in the middle of the pandemic. It enables GE Healthcare to leap frog in the enterprise data analytics strategy.
Big data expert and Infochimps CEO, Jim Kaskade presents the Infinite Monkey Theorem at CloudCon Expo. He provides an energetic, inspiring, and practical perspective on why Big Data is disrupting. It’s more than historic data analyzed on Hadoop. It’s also more than real-time streaming data stored and queried using NoSQL. Learn more at www.Infochimps.com
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
Whether you’ve heard of Google’s MapReduce or not, its impact on Big Data applications, data warehousing, ETL,
business intelligence, and data mining is re-shaping the market for business analytics and data processing.
Attend this session to hear from Curt Monash on the basics of the MapReduce framework, how it is used, and what implementations like SQL-MapReduce enable.
In this session you will learn:
* The basics of MapReduce, key use cases, and what SQL-MapReduce adds
* Which industries and applications are heavily using MapReduce
* Recommendations for integrating MapReduce in your own BI, Data Warehousing environment
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsDataWorks Summit
This presentation discusses forward-looking statements that are subject to risks and uncertainties. It addresses issues around who owns data, who has access to data, and what type of data analysis can be done. It provides details on government-to-government, bank-to-government, and regional data exchanges. It discusses Rante's divisions and approach to unique experiences. Rante aims to anticipate industry trends and push boundaries through research and technology innovations.
The document discusses how telecom companies are increasingly using Hadoop to manage and analyze large amounts of diverse data. It notes that 80% of telecom data will be stored on Hadoop platforms going forward. Hadoop provides more cost-effective storage and processing of data compared to traditional data warehouses. It allows telecom companies to gain more value from all their data by performing more flexible analyses and asking bigger questions of their data. The document outlines some of the key benefits of Hadoop architectures for telecom companies dealing with big data, including being able to retain more types of data for longer periods at a lower cost.
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
This document summarizes a webinar presented by Talend and Caserta Concepts on the big data ecosystem. The webinar discussed how Talend provides an open source integration platform that scales to handle large data volumes and complex processes. It also overviewed Caserta Concepts' expertise in data management, big data analytics, and industries like financial services. The webinar covered topics like traditional vs big data, Hadoop and NoSQL technologies, and common integration patterns between traditional data warehouses and big data platforms.
There are patterns for things such as domain-driven design, enterprise architectures, continuous delivery, microservices, and many others.
But where are the data science and data engineering patterns?
Sometimes, data engineering reminds me of cowboy coding - many workarounds, immature technologies and lack of market best practices.
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
This document discusses big data architectures using Splunk, Hadoop, and relational databases. It begins with an overview of Splunk's scalability and real-time analytics capabilities. It then discusses Hunk, an analytics platform for Hadoop that provides self-service analytics. The document also examines using structured data in Splunk and connecting to relational databases. A case study examines challenges with the open source Hadoop ecosystem. Finally, it outlines a real-world customer architecture that uses Splunk for machine data, Hadoop for storage, Hunk for analytics, and connects to relational databases.
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
The document discusses how an Enterprise Data Lake (EDL) provides a more effective solution for enterprise BI and analytics compared to traditional enterprise data warehouses (EDW). It argues that EDL allows enterprises to retain all datasets, service ad-hoc requests with no latency or development time, and offer a low-cost, low-maintenance solution that supports direct analytics and reporting on data stored in its native format. The document promotes EDL as a mainstream solution that should be part of every mid-sized and large enterprise's standard IT stack.
Modern Data Management for Federal ModernizationDenodo
Watch full webinar here: https://bit.ly/2QaVfE7
Faster, more agile data management is at the heart of government modernization. However, Traditional data delivery systems are limited in realizing a modernized and future-proof data architecture.
This webinar will address how data virtualization can modernize existing systems and enable new data strategies. Join this session to learn how government agencies can use data virtualization to:
- Enable governed, inter-agency data sharing
- Simplify data acquisition, search and tagging
- Streamline data delivery for transition to cloud, data science initiatives, and more
Big Data Commercialization and associated IoT Platform Implications by Ramnik...Data Con LA
The document discusses the evolution from M2M to the Internet of Things (IoT). It notes that the IoT landscape is driven by demand for automated turn-key solutions to reduce complexity, but that business models and value propositions are not well defined. It also discusses Verizon's role in connectivity and packaged solutions. It then provides more detail on Verizon's definition of IoT, the IoT data analytics market opportunity, and Verizon's Palomar and Sheriff analytics platforms and applications in healthcare fraud detection, automotive hacking monitoring, and voice network abuse detection.
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
BI architecture drivers have to change to satisfy new requirements in format, volume, latency, hosting, analysis, reporting, and visualization. In this presentation delivered at the 2014 SATURN conference, SoftServe`s Serhiy and Olha showcased a number of reference architectures that address these challenges and speed up the design and implementation process, making it more predictable and economical:
- Traditional architecture based on an RDMBS data warehouse but modernized with column-based storage to handle a high load and capacity
- NoSQL-based architectures that address Big Data batch and stream-based processing and use popular NoSQL and complex event-processing solutions
- Hybrid architecture that combines traditional and NoSQL approaches to achieve completeness that would not be possible with either alone
The architectures are accompanied by real-life projects and case studies that the presenters have performed for multiple companies, including Fortune 100 and start-ups.
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesSanjay Sharma
Financial institutions today are under intense pressure to provide more value add to the customers, reduce IT costs and also grow year to year. This challenge has been further complicated by huge amounts of data being generated as well as mandatory federal compliances in place.
Similarly, Manufacturing industry today also is facing the challenge to process huge amount of data in real time and predict failures as early as possible to reduce cost and increase production efficiency.
The session will cover some high level Big Data use cases applicable to financial and manufacturing domain and how big data technologies are being used successfully to solve these challenges using some examples in credit card/banking industry in financial domain and semi-conductor production in manufacturing domain.
3 Reasons Data Virtualization Matters in Your PortfolioDenodo
Watch the full session on-demand here: https://goo.gl/upxC5W
Real-Time Analytics for Big Data, Cloud & Self-Service BI
The world of data is only becoming distributed. Privacy, regulations, and the need for real-time decisions are challenging organizations’ legacy information strategy. This webinar will include an expert panel discussion on Logical Data Warehouse, Universal Semantic Layer, and Real-time Analytics by Paul Moxon (VP of Data Architectures), Pablo Alvarez (Director of Product Management), and Alberto Pan (CTO).
Attend and learn:
• The major challenges of legacy information strategies.
• How data virtualization can help you overcome these challenges.
• Strategies for enabling agile data management and analytics.
This document discusses big data, including the large amounts of data being collected daily, challenges with traditional DBMS solutions, the need for new approaches like Hadoop and Aster Data to handle large volumes of structured and unstructured data, techniques for analyzing big data, and case studies of companies like Mobclix and Yahoo using big data solutions.
Overview of analytics and big data in practiceVivek Murugesan
Intended to give an overview of analytics and big data in practice. With set of industry use cases from different domains. Would be useful for someone who is trying to understand Analytics and Big Data.
Carlos González, Hewlett Packard Enterprise, nos habla acerca en la implicación del mercado de Big Data en su negocio y el papel que una solución como Vertica juega en éste de la mano de Qlik.
Top 3 Challenges to Profitable Mortgage LendingEquifax
Uncover how to transform mortgage lending processes and enhance the customer experience by conquering the top 3 challenges keeping your processors from focusing on the tasks that drive business.
This document discusses learning analytics and the differences between academic analytics and learning analytics. It provides:
- Definitions of academic analytics as focused on institutional decision making and management, while learning analytics focuses on supporting student learning and is aimed at learners and instructors.
- An overview of how learning analytics has evolved from traditional testing and assessment to incorporate larger datasets, models, personalization techniques, and insights from digital traces like online activity logs.
- Several examples of how learning analytics can provide insights at the individual student level, within groups, in the classroom, and across academic programs.
- Some of the challenges in implementing learning analytics including issues around ethics, data access, and developing institutional capacity like data science
This document discusses graph analytics and machine learning. It defines what a graph is from a mathematical perspective and provides examples of graph models for social networks. Key graph algorithms are described, including PageRank for identifying influential nodes, triangle counting for measuring clustering, and betweenness centrality. Graph databases are discussed as being optimized for connections versus relational databases which are optimized for aggregation. The document outlines opportunities for combining graph computing, machine learning, and big data technologies.
This document discusses how Eaton provides resilient and efficient IT infrastructure solutions powered by modern IT solutions. It highlights how integrated power management solutions from Eaton allow organizations to view and manage their entire power system from virtualization dashboards, keep critical loads running longer during outages, and ensure maximum business continuity. It also emphasizes how distributed control architectures from Eaton are inherently safer and enable safe auto-adaptation to changing load and power conditions.
The document provides an agenda for an Infoseminar on 3PAR storage solutions. It includes an introduction, overview of 3PAR positioning and capabilities such as virtualization, high availability, efficiency, and recovery manager for VMware. There will also be a demonstration, questions and answers, and information on next steps.
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto
Abstract:
Machine learning researchers and practitioners develop computer
algorithms that "improve performance automatically through
experience". At Google, machine learning is applied to solve many
problems, such as prioritizing emails in Gmail, recommending tags for
YouTube videos, and identifying different aspects from online user
reviews. Machine learning on big data, however, is challenging. Some
"simple" machine learning algorithms with quadratic time complexity,
while running fine with hundreds of records, are almost impractical to
use on billions of records.
In this talk, I will describe lessons drawn from various Google
projects on developing large scale machine learning systems. These
systems build on top of Google's computing infrastructure such as GFS
and MapReduce, and attack the scalability problem through massively
parallel algorithms. I will present the design decisions made in
these systems, strategies of scaling and speeding up machine learning
systems on web scale data.
Speaker biography:
Max Lin is a software engineer with Google Research in New York City
office. He is the tech lead of the Google Prediction API, a machine
learning web service in the cloud. Prior to Google, he published
research work on video content analysis, sentiment analysis, machine
learning, and cross-lingual information retrieval. He had a PhD in
Computer Science from Carnegie Mellon University.
Internet of Things, Connected Infrastructure & The Modern Supply ChainJeff Risley
Every market has infrastructure -- physical assets needed for the operation of an enterprise. And every piece of infrastructure will be impacted by the Internet of Things -- physical "dumb" objects, embedded with sensors, connected to the internet, communicating with one another and people. Understanding this collision of the Internet and Infrastructure is important for the future of both. This is a presentation I gave at the CSCMP Annual Conference.
This document discusses Hewlett-Packard's large format printing operations in Barcelona, Spain over several decades. It highlights four key aspects: 1) Investing in talent through education and training programs. 2) A history of technological innovation in products. 3) Developing an understanding of customer needs. 4) The ability to adapt to changes in markets and customer demands. The combination of these factors along with a culture of innovation has enabled HP Barcelona to become a global leader in large format printing.
MT46 Virtualization Integration with UnityDell EMC World
This session focuses on how virtualized application environments using Dell EMC Unity storage solutions are more agile, more efficient in their use of IT resources and lower costs. With over 90+ integration points with the major virtualization stacks (VMware, Microsoft and OpenStack) Unity is the ideal storage solution for your use. Our discussion will include how storage services are represented and policy based storage provisioning methods used.
PuppetConf track overview: Modern InfrastructurePuppet
From containers to Docker, Mesos and Kubernetes — you'll hear about it at PuppetConf 2016 in San Diego. Learn more and register at https://puppet.com/puppetconf/.
Business Case Calculator for DevOps Initiatives - Leading credit card service...Capgemini
The 2015 World Quality Report data reveals that 61% of respondent’s rate time-to-market as very important which is the key reason for the proliferation of DevOps. The biggest ingredient is speed based on efficiencies upstream and in operations. Technology leaders now need to wear a business hat and build their strategy based on cost to achieve desired velocity as opposed to cost savings.
Join MasterCard and Capgemini to learn about a real time to market driven DevOps business case calculator with technology, process and tool components.
Presented at HPE Discover Las Vegas 2016.
Top 3 Considerations for Machine Learning on Big DataDatameer
This document discusses considerations for machine learning on big data. It provides background on speakers Karen Hsu and Elliott Cordo. It then covers drivers and challenges of big data, including how companies like Amazon and Netflix have leveraged big data analytics. Alternatives to machine learning on big data like data mining, traditional BI, and visualization are discussed. Example use cases and key criteria around ease of use and quality for algorithms like clustering, column dependencies, and decision trees are presented. Best practices for machine learning on big data are provided for clustering, recommendations, and overall analytics processes. The document concludes with a polling question and call to action.
MT47 Modernize infrastructure for a modern data centerDell EMC World
Today's businesses need speed, efficiency and agility to deliver services back to their stakeholders, all at an affordable price. In the Modern Data Center, Flash, along with Scale-out, software-defined solutions, help to automate a modern infrastructure, the foundation of the modern data center. This session will show you how Dell EMC's industry leading storage portfolio can transform your company's infrastructure and drive your success. In addition, learn how to protect your modern data center with Dell EMC’s comprehensive data protection portfolio.
Follow us at @DellEMCStorage
Learn more about Dell EMC All-Flash Solutions at DellEMC.com/All-flash.
Hewlett-Packard: Growing HP's advocate economy, presented by Zealous WileySocialMedia.org
In his SocialMedia.org Member Meeting case study presentation, Hewlett-Packard’s Digital Marketing Manager, Zealous Wiley, talks about how their enterprise software group is leveraging employee advocacy at HP.
Zealous explains how they're empowering and activating employees to increase awareness, gain share of voice, and generate preference for its enterprise software products and services.
Courte présentation du Big Data & Machine Learning lors d'un séminaire à la Cité Internationale Universitaire de Paris, Maison des Etudiants Arméniens.
Every second matters in sports and event broadcastingSwitchOn to Eaton
TV networks and broadcasters understand the importance of reliable power at their events. Eaton UPSs are often a "behind the scenes" part of delivering the TV or radio broadcast to the fans. Take a look back on some of Eaton's powerful plays in 2016. Learn more: switchon.eaton.com
This document discusses Hadoop and big data. It notes that digital data doubles every two years and that 85% of data is unstructured. Hadoop provides a cheaper way to store large amounts of both structured and unstructured data compared to traditional storage options. Hadoop also allows data to be stored first before defining what questions will be asked of the data.
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
Depuis les années 1980, le volume de données produit et le risque lié à ces données ont littéralement explosé. 90% des données existantes aujourd’hui ont été créé ces 2 dernières années, dont 80% sont non structurées. Avec plus d’utilisateurs et le besoin de disponibilité permanent, les risques sont beaucoup plus élevés.
Quels sont les paramètres de bases de données qu’un décideur doit prendre en compte pour déployer ses applications innovantes?
This document discusses choosing the right data architecture for big data projects. It begins by acknowledging big data comes in many types, from structured transactional data to unstructured text data. It then presents several big data architectures and platforms that are suitable for different data types and use cases, such as relational databases, NoSQL databases, data grids, and distributed file systems. The document emphasizes that one size does not fit all and the right choice depends on the specific data and business needs.
Certus Accelerate - Building the business case for why you need to invest in ...Certus Solutions
The document discusses building a business case for investing in data by highlighting the large percentage of unstructured data growth across different industries like healthcare, government, utilities and media. It emphasizes that 80% of new data is unstructured and invisible to computers. The world is being rewritten in software code and cloud is the new platform for reimagining industries. It then discusses the need for predictive, prescriptive and cognitive systems to make sense of vast amounts of data. Investing in data integration, governance and master data management is essential to unlock insights from all data sources and provide a comprehensive view of information. Justifying such investments requires looking at the potential costs of data quality failures and benefits of avoiding rework.
This document summarizes a webinar on data as a service. It discusses how data virtualization through Denodo can enable agile business intelligence by providing pre-aggregated data to users quickly. It describes how Denodo creates API access to data, allows for an enterprise data marketplace, and integrates machine learning models to power operational AI. A demonstration of a personal COVID-19 risk monitor is provided.
How telecommunication companies can leverage power Hadoop and Big Data to derive use cases.
Based on Cloudera Whitepaper - Big Data Use Cases for Telcos
ADV Slides: How to Improve Your Analytic Data Architecture MaturityDATAVERSITY
Many organizations are immature when it comes to data use. The answer lies in delivering a greater level of insight from data, straight to the point of need. Enter: machine learning.
In this webinar, William will look at categories of organizational response to the challenge across strategy, architecture, modeling, processes, and ethics. Machine learning maturity levels tend to move in harmony across these categories. As a general principle of maturity models, you can’t skip levels in any category, nor can you advance in one category well beyond the others.
Vis-à-vis ML, attaining and retaining momentum up the model is paramount for success. You will ascend the model through concerted efforts delivering business wins utilizing progressive elements of the model, and thereby increasing your machine learning maturity. The model will evolve. No plateaus are comfortable for long.
With ML maturity markers, sequencing, and tactics, this webinar provides a plan for how to build analytic Data Architecture maturity in your organization.
Dell NVIDIA AI Powered Transformation in Financial Services WebinarBill Wong
Digital transformation through data analytics and AI can help financial services firms address business, technology, and labor challenges caused by COVID-19. Key trends include increased reliance on remote work and digital platforms, and the importance of data analytics for decision making. By 2025, 90% of new apps will use AI. The document discusses NVIDIA and Dell Technologies' partnership and strategies for providing infrastructure to support AI workloads through solutions like the DGX A100 system, which can support training, inference, and analytics on one platform through technologies like GPUs and MIG. This helps provide a more flexible and efficient infrastructure compared to traditional siloed approaches.
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
Title
DataOps, the secret weapon for delivering AI, data science, and business intelligence value at speed.
Synopsis
● According to recent research, just 7.3% of organisations say the state of their data and analytics is excellent, and only 22% of companies are currently seeing a significant return from data science expenditure.
● Poor returns on data & analytics investment are often the result of applying 20th-century thinking to 21st-century challenges and opportunities.
● Modern data science and analytics require secure, efficient processes to turn raw data from multiple sources and in numerous formats into useful inputs to a data product.
● Developing, orchestrating and iterating modern data pipelines is an extremely complex process requiring multiple technologies and skills.
● Other domains have to successfully overcome the challenge of delivering high-quality products at speed in complex environments. DataOps applies proven agile principles, lean thinking and DevOps practices to the development of data products.
● A DataOps approach aligns data producers, analytical data consumers, processes and technology with the rest of the organisation and its goals.
The document discusses artificial intelligence (AI) and Capgemini's approach to AI. It provides examples of how AI can be applied in different industries and business functions. It also outlines Capgemini's AI platform, principles, and offerings. Capgemini aims to help clients implement impactful and scalable AI solutions through a combination of technology, services, and ecosystem partnerships.
The document discusses SAP's big data strategy and solutions. It outlines that SAP provides a full platform for big data, including tools for data ingestion, storage, processing, analytics, and applications. It also notes that SAP partners are key to success. Examples of SAP's big data solutions are presented, including predictive maintenance, fraud detection, and real-time optimization. The document emphasizes that SAP transforms businesses by enabling insights from massive, diverse data in real-time.
This document discusses big data perspectives, current and future. It notes that smartphones, tablets, and other connected devices now equal the world's population. It also discusses how connected vehicles utilize various systems and analytics. Finally, it discusses forecasts that 50 billion devices will be connected to the web by 2020 and how organizations are currently exploring and implementing big data technologies and analyzing traditional customer and transaction data.
This document discusses how utilities can transform big data into smart data to realize business value. It defines big data and provides examples of large amounts of data being generated. The document outlines how utilities can leverage big data and analytics to improve grid operations, asset and workforce management, and smart metering. This enables benefits like increased customer intimacy, more relevant insights, and competitive advantages. It provides examples of business issues utilities may want to address and presents an approach to answering business questions by selecting and building intelligent solutions.
A Journey Through The Far Side Of Data Sciencetlcj97
This document summarizes a presentation on data science and artificial intelligence. It discusses how AI is transforming businesses in many ways, including automating repetitive tasks, improving customer experiences, and driving revenue growth. It also mentions that while data is important, AI is needed to transform organizations through intelligent process optimization and innovation. The document provides examples of how various companies are applying AI in sales, customer service, and other areas. It emphasizes that AI strategies should focus on innovation, identifying high-impact use cases, and developing people's data science skills.
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
This document discusses big data, including its definition, characteristics, and architecture capabilities. It defines big data as large datasets that are challenging to store, search, share, visualize, and analyze due to their scale, diversity and complexity. The key characteristics of big data are described as volume, velocity and variety. The document then outlines the architecture capabilities needed for big data, including storage and management, database, processing, data integration and statistical analysis capabilities. Hadoop and MapReduce are presented as core technologies for storage, processing and analyzing large datasets in parallel across clusters of computers.
Building the Cognitive Era : Big Data StrategiesKevin Sigliano
This document discusses big data and its applications. It begins with an overview of the growth of data and defines big data. Examples are given of how companies like Walmart, the CIA, and Puig use big data. The challenges of big data including volume, veracity, velocity and variety are described. Common applications of big data like customer insights, marketing, and risk detection are mentioned. The document outlines a roadmap for implementing a big data strategy and discusses technologies and terms. Success cases in fast moving consumer goods are presented. Finally, the benefits of big data for survival, strategic decisions, and cost reductions are noted.
Gain New Insights by Analyzing Machine Logs using Machine Data Analytics and BigInsights.
Half of Fortune 500 companies experience more than 80 hours of system down time annually. Spread evenly over a year, that amounts to approximately 13 minutes every day. As a consumer, the thought of online bank operations being inaccessible so frequently is disturbing. As a business owner, when systems go down, all processes come to a stop. Work in progress is destroyed and failure to meet SLA’s and contractual obligations can result in expensive fees, adverse publicity, and loss of current and potential future customers. Ultimately the inability to provide a reliable and stable system results in loss of $$$’s. While the failure of these systems is inevitable, the ability to timely predict failures and intercept them before they occur is now a requirement.
A possible solution to the problem can be found is in the huge volumes of diagnostic big data generated at hardware, firmware, middleware, application, storage and management layers indicating failures or errors. Machine analysis and understanding of this data is becoming an important part of debugging, performance analysis, root cause analysis and business analysis. In addition to preventing outages, machine data analysis can also provide insights for fraud detection, customer retention and other important use cases.
Similar to "From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning", Marco Gessner, Corporate Systems Engineer at HPE (20)
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Dataconomy Media
The challenges of increasing complexity of organizations, companies and projects are obvious and omnipresent. Everywhere there are connections and dependencies that are often not adequately managed or not considered at all because of a lack of technology or expertise to uncover and leverage the relationships in data and information. In his presentation, Axel Morgner talks about graph technology and knowledge graphs as indispensable building blocks for successful companies.
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Dataconomy Media
The document discusses emerging technologies and their potential impacts, and questions how individuals and societies can responsibly address issues arising from new technologies. It notes that governments, regulators, and individuals struggle to understand new concepts that spread rapidly. It asks if there are existing systems or forms of cooperation that could help societies address responsibilities related to technologies, but offers no definitive solutions, mainly posing questions.
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Dataconomy Media
Every day we are challenged with more data, more use cases and an ever increasing demand for analytics. In this talk Bjorn will explain how autonomous data management and machine learning help innovators to more productive and give examples how to deliver new data driven projects with less risk at lower costs.
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...Dataconomy Media
This document contains an agenda and presentation materials for a talk on building and deploying an anti-money laundering (AML) model using DataRobot. The agenda includes introductions to DataRobot and AML, an AML demo, a real AML use case example, and a question and answer section. The presentation materials provide background on DataRobot, including its history and products. It also gives an overview of money laundering and how AML works, both traditionally using rule-based systems and how machine learning can help by reducing false positives and improving efficiency. A case study shows how DataRobot has helped other organizations with AML use cases.
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
Trump, Brexit, Cambridge Analytica... In the last few years, we have had to confront the consequences of the use and misuse of data science algorithms in manipulating public opinion through social media. The use of private data to microtarget individuals is a daily practice (and a trillion-dollar industry), which has serious side-effects when the selling product is your political ideology. How can we cope with this new scenario?
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...Dataconomy Media
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help alleviate symptoms of mental illness and boost overall mental well-being.
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Dataconomy Media
The document discusses data innovation and Men on the Moon's approach. It notes that while there is a large amount of available data worldwide, only a small portion is used to create value. Most data science projects also fail. The document then outlines Men on the Moon's "Data Thinking" approach, which combines design thinking and data science. Their approach involves defining a data vision, identifying use cases, prototyping solutions, and enabling employees. The goal is to leverage data to create valuable solutions for people through data innovation.
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...Dataconomy Media
What does it take to build a good data product or service? Data practitioners always think about the technology, user experience and commercial viability. But rarely do they think about the implications of the systems they build. This talk will shed light on the impact of AI systems and the unintended consequences of the use of data in different products. It will also discuss our role, as data practitioners, in planting the seeds of fairness in the systems we build.
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Dataconomy Media
People analytics uses data science techniques like machine learning and pattern recognition on employee data to generate insights and reports that can help businesses make smarter talent and operational decisions. These decisions can improve workforce effectiveness, engagement, recruitment, retention and performance while also increasing sales and reducing fraud and accidents. People analytics technologies include surveys, correlation analysis, machine learning and AI which can help companies improve their culture, develop employee skills and boost growth when the results are properly implemented.
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Dataconomy Media
Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Dataconomy Media
In the data industry, having correctly labelled datasets is vital. Timothy Thatcher explains how tagging your data while considering time and location and complex hierarchical rules at scale can be handled.
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Dataconomy Media
This document discusses using machine learning to analyze individual and interpersonal behavior for clinical diagnosis and screening. It focuses on analyzing non-verbal behaviors like interpersonal synchronization that have been shown to be impaired in conditions like autism spectrum disorder. The document proposes that machine learning could provide an objective, automated tool for diagnosing conditions more quickly by analyzing video recordings of social interactions. This may help address bottlenecks in healthcare systems and allow earlier access to treatment.
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Dataconomy Media
This document discusses the end-to-end experimentation platform at GetYourGuide for A/B testing. It outlines the challenges of running experiments such as imbalanced assignments, suspicious metric changes, and non-converging results. It also describes the tools used for planning experiments, monitoring assignments, performing daily checks, and analyzing results. The goal is to validate UX changes, estimate effects on customers, and make more objective decisions through A/B testing while addressing issues that could impact experiment quality.
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Dataconomy Media
Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Dataconomy Media
Creativity is the mental ability to create new ideas and designs. Innovation, on the other hand, Means developing useful solutions from new ideas. Creativity can be goal-oriented, Whereas innovation is always goal-oriented. This bedeutet, dass innovation aims to achieve defined goals. The use of cloud services and technologies promises enterprise users many benefits in terms of more flexible use of IT resources and faster access to innovative solutions. That’s why we want to examine the question in this talk, of what role cloud computing plays for innovation in companies.
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Dataconomy Media
Presentation of Time Series Properties of Financial Instrument and Possibilities in Frequency Decomposition and Information Extraction using FT, STFT and Wavelets with Outlook in Current Research on Wavelet Neural Networks
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Dataconomy Media
"With most machine learning (ML) and deep learning (DL) frameworks, it can take hours to move data for ETL, and hours to train models. It's also hard to scale, with data sets increasingly being larger than the capacity of any single server. The amount of the data also makes it hard to incrementally test and retrain models in near real-time.
Learn how Apache Ignite and GridGain help to address limitations like ETL costs, scaling issues and Time-To-Market for the new models and help achieve near-real-time, continuous learning.
Yuriy Babak, the head of ML/DL framework development at GridGain and Apache Ignite committer, will explain how ML/DL work with Apache Ignite, and how to get started.
Topics include:
— Overview of distributed ML/DL including architecture, implementation, usage patterns, pros and cons
— Overview of Apache Ignite ML/DL, including built-in ML/DL algorithms, and how to implement your own
— Model inference with Apache Ignite, including how to train models with other libraries, like Apache Spark, and deploy them in Ignite
— How Apache Ignite and TensorFlow can be used together to build distributed DL model training and inference"
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
"Machine learning algorithms require significant amounts of training data which has been centralized on one machine or in a datacenter so far. For numerous applications, such need of collecting data can be extremely privacy-invasive. Recent advancements in AI research approach this issue by a new paradigm of training AI models, i.e., Federated Learning.
In federated learning, edge devices (phones, computers, cars etc.) collaboratively learn a shared AI model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. From personal data perspective, this paradigm enables a way of training a model on the device without directly inspecting users’ data on a server. This talk will pinpoint several examples of AI applications benefiting from federated learning and the likely future of privacy-aware systems."
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Monthly Management report for the Month of May 2024
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning", Marco Gessner, Corporate Systems Engineer at HPE
1. From Big Data To Big Value
with HPE Vertica’s Predictive Analytics &
Machine Learning
Marco Gessner, Corporate Systems Engineer – marco.gessner@hpe.com
Zurich, 17th Nov 2016
2. The world is changing and accelerating
Big Data is no longer just a Buzzword – It’s EVERYWHERE and growing … and this is just
what’s structured …
2020*
40 ZB
2005 2010
2012
2015
8.5 ZB
2.8 ZB1.2 ZB0.1 ZB
Volume
Variety
Value??
IDC Estimates that by 2020, business transactions on the
internet - business-to-business and business-to-consumer
- will reach 450 billion per day.
*Source : IDC Digital Universe in 2020
Mobility
Big Data
Cloud
MobileTransactional
Data
CRM, SCM, ERP
$ € ¥
Social MediaIT Ops
Velocity
Log
Files
Sensors,
Counters
3. Enterprises realize only 10-15% of the value expected
on their big data investments
Translating Data to ValueTechnology GapSilos and Lack of Alignment
Barriers:
55%
“determining how to get
value from big data” - top 3
challenge with big data***
41%
don't know if big data ROI
will be positive or negative****
57%
“obtaining the necessary
skills and capabilities
needed” - top 3 challenge
for Hadoop**
41%
systems cannot process
large volumes of data
from different sources*
*** Gartner – Survey Analysis: Practical Challenges Mount as Big Data
Moves to Mainstream – 9/2015
****Lisa Kart, Gartner – Big Data Industry Insights – presentation
* PWC – Capitalizing on the promise of Big Data – 1/2013
** Gartner – Survey Analysis: Hadoop Adoption Drivers and Challenges
– 5/2015
4. Business Value
Level of Intelligence
Prescriptive Analytics
How can we make it
Happen?
Predictive Analytics
What will Happen?
Diagnostic Analytics
Why did it Happen?
Descriptive Analytics
What Happened?
1
2
3
Predictive /
Machine Learning
4
Business
Intelligence
Big Data Value Model
5. A Real-World Example:
Smartmeter data for 38 million households in a country for 13 months
counter_id |reading_ts |notification_ts |tc|reading |power
100012700004100|2008-11-30 23:00:05|2008-11-30 23:01:29|FT|271006206|273
100012700014100|2008-12-01 05:30:15|2008-12-01 05:33:53|HT|203859364|294
100012700014100|2008-12-01 21:30:17|2008-12-01 21:34:27|LT|922915648|1472
5
• 73 bytes per line every 10 minutes
• X 144 lines per day (6 per hour x 24)
• X 38,000,000 households
• X 395 13 months’ data retention
= 157,785,120,000,000 Bytes – ~158
Terabytes
April 10, 2015 HP Confidential
7. The Vertica Real-Time Analytics Engine
Confidential 7
Leverages BI, ETL,
Hadoop/MapReduce and
OLTP investments
No disk I/O bottleneck
simultaneously load &
query
Native DB-aware
clustering on low-cost
x86 Linux nodes
Built-in redundancy that
also speeds up queries
Automatic setup,
optimization, and DB
management
Up to 90% space
reduction using 10+
algorithms
50x – 1000x faster
than traditional
RDBMS
Scales fromTB to PB
with industry-
standard hardware
Simple integration
with existing ETL and
BI solutions
SQL-99+ compliant
Ultimate deployment
flexibility
Extended advanced
analytics
24/7 Load & Query
8. #SeizeTheData
Building Machine Learning (ML) into the Core of Vertica
- Run in parallel across hundreds of nodes in a
Vertica cluster
- Eliminating all data duplication typically required
of alternative vendor offerings
- No need to “down-sampling” which can lead to
less accurate predictions
- A single system for SQL analytics and Machine
Learning
Confidential 8
Node 1 Node 2…. Node n
New capabilities deliver predictive analytics at speed and scale
9. A few ML & PA functions for answering various business
questions
– Classification / Scoring
– Who will churn, fraud or buy next week, next month ?
– Regression
– How many products will a customer buy next month, next quarter ?
– Segmentation / Clustering
– What are the groups of customers with similar behavior or profile ?
– Forecasting
– How much will be the monthly revenue or number of churners next year ?
– Recommendations
– What is the best offer or action for a customer or internet user ?
9
10. Some ML & PA Use Cases by Industry
Customer churn, network
optimization (forecast system load),
cross/up selling, customer retention,
network fraud detection
Credit risk management, anti-money
laundering, fraudulent card usage
detection; Identify key behaviors of
customers likely to leave the bank;
Forecasting, inventory planning,
cross/up selling, customer
segmentation, market basket
analysis; Intelligent selection of store
locations based on demographics;
Logistics optimization, fraud
prevention; Predict community
movement and trends that affect
taxing districts; anticipate revenue
Health management, fraud
prevention; Medical: predict the
causes, likelihood and spread of
disease, genome analysis, research.
Customer profitability, fraudulent claims
detection and prevention.
Maybe creating insurance tariffs in the
first place: The ML algorithms were done
by hand when insurances began in
Venice and London in the 17th century.
Market Demand forecasting, Launch
Analysis (predict best selling
configuration); service parts
optimization; Customer satisfaction;
production predictive maintenance;
React on customer tastes.
Price optimization, assortment planning,
forecasting; predictive maintenance
Predictive asset maintenance, market
and credit risks; Forecast demand and
usage for seasonal operations; provide
anticipated resources. Forecast of
electricity consumption by geography for
powering up or throttling nuclear power
stations 20 hours in advance.
Utilities
Healthcare
Public Sector
Telecommunications
Banking/Finance
Insurance
Manufacturing/Wholesale
Automotive
Retail/CPG
11. Use Cases by LoB….
Sales, Service, Finance & Marketing
Market basket analysis
Customer loyalty
programs
Cross-sell and up-sell
opportunities
Marketing campaign
response rates
Better pipeline and
revenue forecasting
Service and maintenance
staffing and planning
Logistics and inventory
management
Predictive asset
maintenance
Human Resources
Accurate prediction of churn/retirement
for staffing and planning; enable proactive
churn prevention by targeting likely churn
candidates = greater retention of top
employees
Recruitment (headcount) and retention
planning
Employee performance and productivity:
identify factors influencing high
performance
Workforce training enablement and
effectiveness
Succession planning
Operations
More accurate orders
based on customer
demand and ensure
inventory adequately
positioned to satisfy
demand
Increase turn, with more
efficient balance of
stocks based on demand
More accurate forecasting to enable
efficient recruiting
Enable better operational planning for
new hires (e.g. facilities requisitions,
space, equipment)
Significant improvements
to the earlier (beginning of
the quarter) Pipeline
Forecasting accuracy
Improved Revenue Quality
through increased
effectiveness of Sales and
Marketing
Better Service profitability
through improved renewal
and pricing
Increase in revenue from
segmented customers and
targeted campaigns
IT
Optimize staffing to handle
trend/seasonality in support
demand. (avoid over-staffing,
reduce cost; avoid under-
staffing, improve customer
satisfaction)
Proactive predictive
maintenance prevents costly
unscheduled downtime due to
equipment failure; enable
scheduling of planned
maintenance windows
Support/call center analysis
Asset utilization demand
planning
Procurement
planning/forecasting
Project optimization and
assessment
Anticipate peek
performance issues and root
cause
Read how Google is using advanced
analytics to improve their HR:
http://online.wsj.com/article/SB12426
9038041932531.html?hat_input=googl
e+staffing#articleTabs=articles
12. #SeizeTheData
Machine Learning Pack – 8.0
Algorithm Model Training Prediction Evaluation
Linear Regression
Logistic Regression
K-means
Confidential 12
Model
Management
Summarize
models
Rename models Delete models
Data Preparation
Normalization Imbalanced data
processing
Sampling
R integration
13. Linear Regression Use Cases
Real Estate
Model residential home prices
(response) as a function of the
home’s living area, number of
bedrooms, number of bathrooms
and so on (predictors)
Demand Forecasting
Model the demand for a service or
good (response) based on its
features (predictors); for example,
demand for different models of
laptops based on monitor size,
weight, price, operating system,
etc.
Manufacturing
Determine linear relationship
between the compressive
strength of concrete (response)
and varying amounts of its
components (predictors) like
cement, slag, fly ash, water, super
plasticizer, coarse aggregate, etc.
Confidential 13
14. Logistic Regression Use Cases
Finance
Use a loan-applicant’s credit
history, income, and loan
conditions (predictors) to
determine probability that
applicant will default on loan
(response). The result can be
used for approving, denying, or
changing loans terms
Engineering
Predicting the likelihood that a
particular mechanical part of a
system will malfunction or require
maintenance (response) based on
operating conditions and
diagnostic measurements
(predictors)
Confidential 14
Medicine
Determine the likelihood of a
patient’s successful response to a
particular medicine or treatment
(response) based on factors like
age, blood pressure, smoking and
drinking habits (predictors)
15. K-means Clustering Use Cases
Customer Segmentation
Segment customers and buyers
into distinct groups (cluster)
based on similar attributes like
age, income, product preferences,
etc. in order to target promotions,
provide support and explore
cross-sell opportunities
Fraud Detection
Identify individual observations
that don’t align to a distinct group
(cluster) and identify types of
clusters that are more likely to be
at risk of fraudulent behavior
Confidential 15
16. Architected to embrace an ecosystem of innovation
Advanced
analytics
Cloud
BI/visualization
Platform
Data
transformation
HPE Vertica
17. HPE Big Data Software
Performance and scale
Consume and deploy anywhere
Open source without compromise
Machine Learning accessible to all
Advanced analytics from the core
Analytics is everything we do
19. HPE Vertica empowers Philips Transform From Reactive to Proactive Customer Service
19
Big Data & Analytics
Predictive Maintenance Platform
Philips Predictive Maintenance Platform
HPEVertica technology empowers Philips transform
Customer Services from Reactive to Proactive using Big
Data andAnalytics.
• 24 different data sources (events, errors, sensor,
business data) integrated into a singleVertica
database
• 10K+ connected MRI systems
• 38 Predictive and proactive maintenance models,
25 additional models in development
• 140 billion rows and growing fast
• 60+TB historic data
• 1M weekly system files processed
Business Outcomes
- Decreased unplanned downtime
- Increased scheduled maintenance
- Improved customer satisfaction
- Input for R&D to optimize product quality
- Foundation for many added-value services
System problem Dispatch FSECustomer call On site diagnosis Parts delivery Repair/replace System functional
Reactive
Problem avoidedFailure prediction Scheduled service
Proactive
20. Community Edition
- Free Download 1TB, 3 nodes
- my.vertica.com/
Learn More About – and Try! - HPE Vertica