Relevant recommendations play a major role in the positive buyer experience and have become a critical tool for online retailers, banks, insurance providers and most all other industries.
The Big Data platform provides a scalable, flexible and cost effective infrastructure to create a recommendation engine that provides exponential improvement over traditional psych-demographic profiling and "old world" targeted marketing.
Caserta Concepts and Datameer held a webinar to share how to integrate unstructured log data with traditional data warehouse data on Hadoop to build a robust Big Data Recommendation Engine.
Speakers:
Elliott Cordo, Principal Consultant, Caserta Concepts
Adam Gugliciello, Solutions Engineer, Datameer
For more information, visit http://www.casertaconcepts.com/ or http://www.datameer.com/
How to Build a Recommendation Engine on SparkCaserta
How to Build a Recommendation Engine on Spark was a presentation given by Joe Caserta, CEO and founder of Caserta Concepts, at @AnalyticsWeek in Boston.
Boston's Data AnalyticsStreet Conference is a 2 day packed event with thought provoking keynotes, knowledge filled sessions, intense workshops, insightful panels, and real-world case studies - engaging analytics community with latest methodologies and trends. The conference encompasses largest Speaker-to-Attendee ratio for unmatched networking and learning opportunity.
For more information on the services and solutions Caserta Concepts offers, visit our website at http://casertaconcepts.com/.
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...Amazon Web Services
This session explores the use of advanced tools and strategies for data analytics to increase audience engagement and customer retention. The presentation examines the use of machine learning to build predictive analytics models to identify customers likely to churn as well as for sentiment analysis for customer interest. This also includes a case study on predictive analytics for OTT (over the top) content delivery.
SQL injection is a code injection technique, used to attack data-driven applications,
in which malicious SQL statements are inserted into an entry field for execution.
This is a method to attack web applications that have a data repository.The
attacker would send a specially crafted SQL statement that is designed to cause
some malicious action.SQL injection is an attack technique that exploits a security
vulnerability occurring in the database layer of an application and a service. This
is most often found within web pages with dynamic content.
How to Build a Recommendation Engine on SparkCaserta
How to Build a Recommendation Engine on Spark was a presentation given by Joe Caserta, CEO and founder of Caserta Concepts, at @AnalyticsWeek in Boston.
Boston's Data AnalyticsStreet Conference is a 2 day packed event with thought provoking keynotes, knowledge filled sessions, intense workshops, insightful panels, and real-world case studies - engaging analytics community with latest methodologies and trends. The conference encompasses largest Speaker-to-Attendee ratio for unmatched networking and learning opportunity.
For more information on the services and solutions Caserta Concepts offers, visit our website at http://casertaconcepts.com/.
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...Amazon Web Services
This session explores the use of advanced tools and strategies for data analytics to increase audience engagement and customer retention. The presentation examines the use of machine learning to build predictive analytics models to identify customers likely to churn as well as for sentiment analysis for customer interest. This also includes a case study on predictive analytics for OTT (over the top) content delivery.
SQL injection is a code injection technique, used to attack data-driven applications,
in which malicious SQL statements are inserted into an entry field for execution.
This is a method to attack web applications that have a data repository.The
attacker would send a specially crafted SQL statement that is designed to cause
some malicious action.SQL injection is an attack technique that exploits a security
vulnerability occurring in the database layer of an application and a service. This
is most often found within web pages with dynamic content.
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
Joe Caserta provides a statistically-driven model to understanding the customer path to purchase, which combines online, offline and third-party data sources. He shows how customer data is fed to machine learning, which assigns weighted credit to customer interactions in order to give insight to what marketing activities truly matter. This presentation is from Caserta's February 2018 Big Data Warehousing Meetup co-hosted with Databricks.
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
Joe Caserta explores the world of analytics, tech, and AI to paint a picture of where business is headed. This presentation is from the CDAO Exchange in Miami 2018.
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
Looker Presentation
Caserta Concepts, Blue Apron, and Looker are joining forces to recount the journey of an enterprise-level migration to the cloud with exciting new technologies. This is the analytics platform of the future.
Greg Wells shares Caserta's view on Enterprise Cloud Adoption and how this is impacting BI and the data architecture needed to support big data analytics. Daniel Mintz, Looker’s Chief Data Evangelist, presents a comprehensive introduction to Looker, and Jason Jho, Head of Data Engineering, Blue Apron, gives the audience an in-depth look at the environment that is helping Blue Apron achieve success by using data to become an analytics-driven company.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Building a New Platform for Customer Analytics Caserta
Caserta Concepts and Databricks partner up to bring you this insightful webinar on how a business can choose from all of the emerging big data technologies to figure out which one best fits their needs.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
Joe Caserta provides a statistically-driven model to understanding the customer path to purchase, which combines online, offline and third-party data sources. He shows how customer data is fed to machine learning, which assigns weighted credit to customer interactions in order to give insight to what marketing activities truly matter. This presentation is from Caserta's February 2018 Big Data Warehousing Meetup co-hosted with Databricks.
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
Joe Caserta explores the world of analytics, tech, and AI to paint a picture of where business is headed. This presentation is from the CDAO Exchange in Miami 2018.
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
Looker Presentation
Caserta Concepts, Blue Apron, and Looker are joining forces to recount the journey of an enterprise-level migration to the cloud with exciting new technologies. This is the analytics platform of the future.
Greg Wells shares Caserta's view on Enterprise Cloud Adoption and how this is impacting BI and the data architecture needed to support big data analytics. Daniel Mintz, Looker’s Chief Data Evangelist, presents a comprehensive introduction to Looker, and Jason Jho, Head of Data Engineering, Blue Apron, gives the audience an in-depth look at the environment that is helping Blue Apron achieve success by using data to become an analytics-driven company.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Building a New Platform for Customer Analytics Caserta
Caserta Concepts and Databricks partner up to bring you this insightful webinar on how a business can choose from all of the emerging big data technologies to figure out which one best fits their needs.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Webinar Presentation: Building a Big Data Recommendation Engine
1. Focused Expertise Industries Served
• Data Warehouse Design • Healthcare / Insurance
• Business Intelligence • Financial Services
• Big Data Analytics • Retail / eCommerce
• Search / Relevance • Digital Media / Marketing
• Infographics • K-12 / Higher Education
445 Park Ave New York, NY | 1-855-755-2246 | info@casertaconcepts.com
Big Data
Analytics
2. Recommendations
• Your customers expect them
• Good recommendations make life easier
• Help them find information, products, and services they might not
have thought of
• What makes a good recommendation?
• Relevant but not obvious
• Sense of “surprise”
SOLD!! 23” LED TV 24” LED TV 25” LED TV
23” LED TV``
Blu-Ray Home Theater HDMI Cables
3. Where can recommendations
engines be found?
• Applications can be found in a wide variety of industries
and applications:
• Travel
• Service Industry
• Music/Online radio
• TV and Video
• Online Publications
• Retail
..and countless others
Our Use Case: Movie Ratings!
4. Our Goal
• Create a powerful, scalable recommendation engine with minimal
development
• Make recommendations to users as they are browsing movie titles -
instantaneously
• Recommendation must have context to the movie they are currently
viewing.
OOPS! – too much surprise!
5. How do we hope to accomplish this?
Hadoop – distributed file system and processing platform
Mahout – collection of machine learning libraries
We will leverage 2 algorithms:
• Item Similarity– how similar is this particular movie to other
movies based on usage
• Item-Based Recommender – predict an individuals
preference based on their peers ratings
• Both algorithms only require a simple dataset of 3 fields:
“User ID” , “Item ID”, “Rating”
6. Item Similarity – Context, Content Filtering
“People who liked this movie liked these as well”
• Item Similarity builds a matrix of items to other items and calculates
similarity (based on user rating)
• The most similar item are then output as a list:
• Item ID, Similar Item ID, Similarity Score
• Items with the highest score are most similar
• In this example users who liked “Twelve Monkeys” (7) also like “Fargo” (100)
7 100 0.690951001800917
7 50 0.653299445638532
7 117 0.643701303640083
7. Item-Base – Peer, Collaborative Filtering
“People with similar taste to you liked these movies”
• Item-Base takes the Item Similarity matrix and weights based on
“peer” user preference.
• Essentially it determines the best movie critics for you to follow
• The items with the highest recommendation score are then output as tuples
• User ID [Item ID1:Score,…., Item IDn:Score]
• Items with the highest recommendation score are the most relevant to this user
• For user “Johny Sisklebert” (572), the two most highly recommended movies are
“Seven” and “Donnie Brasco”
572 [11:5.0,293:4.70718,8:4.688335,273:4.687676,427:4.685926,234:4.683155,168:4.669672,89:4.66959,4:4.65515]
573 [487:4.54397,1203:4.5291,616:4.51644,605:4.49344,709:4.3406,502:4.33706,152:4.32263,503:4.20515,432:4.26455,611:4.22019]
574 [1:5.0,902:5.0,546:5.0,13:5.0,534:5.0,533:5.0,531:5.0,1082:5.0,1631:5.0,515:5.0]
8. Recommendation Store
• Serving recommendations needs to be instantaneous
We need a database!
• The core to this solution is two reference tables:
Rec_Item_Similarity Rec_User_Item_Base
Item_ID User_ID
Similar_Item Item_ID
Similarity_Score Recommendation_Score
• When called to make recommendations we query our store
• Rec_Item_Similarity based on the Item_ID they are viewing
• Rec_User_Item_Base based on their User_ID
9. Delivering Recommendations
So if Johny is viewing “12 Monkeys” we query our
recommendation store and present the results
Item Similarity Raw Score Score
Item-Base (Peer) Raw Score Score
Fargo 0.691 1.000
Seven 5.000 1.000
Star Wars 0.653 0.946
Donnie Brasco 4.707
Item-Based: 0.941
Rock, The 0.644 0.932
Babe 4.688 0.938
Pulp Fiction 0.628 0.909 Peers like these
Heat 4.688 0.938
Return of the Jedi 0.627 0.908 Movies
To Kill a Mockingbird 4.686 0.937
Independence Day 0.618 0.894
Jaws 4.683 0.937
Willy Wonka 0.603 0.872
Monty Python, Holy Grail 4.670 0.934
Mission: Impossible 0.597 0.864 Best
Blade Runner 4.670 0.934
Silence of the Lambs, The 0.596 0.863
Get Shorty
Recommendations
4.655 0.931
Star Trek: First Contact 0.594 0.859
Raiders of the Lost Ark 0.584 0.845
Terminator, The 0.574 0.831 Top 10 Recommendations
Blade Runner 0.571 0.826
Usual Suspects, The 0.569 0.823 Seven (Se7en) 1.823
Seven (Se7en) 0.569 0.823 Blade Runner 1.760
Fargo 1.000
Star Wars 0.946
Donnie Brasco 0.941
Babe 0.938
Heat 0.938
To Kill a Mockingbird 0.937
Jaws 0.937
Monty Python, Holy Grail 0.934
10. From Good to Great Recommendations
• Note that the first 5 recommendations look pretty good
…but the 6th result would have been “Babe” the children's movie
OOPS!
• Tuning the algorithms might help: parameter changes, similarity
measures.
• How else can we make it better?
1. Delivery filters
2. Introduce additional algorithms such as K-Means, or Fuzzy K-Means
11. Delivery Scoring and Filters
Apply assumptions to control the results of collaborative filtering
• One or more categories must match
• Only children movies will be recommended for children's movies.
Action Adventure Children's Comedy Crime Drama Film-Noir Horror Romance Sci-Fi Thriller
Twelve Monkeys 0 0 0 0 0 1 0 0 0 1 0
Babe 0 0 1 1 0 1 0 0 0 0 0
Seven (Se7en) 0 0 0 0 1 1 0 0 0 0 1
Star Wars 1 1 0 0 0 0 0 0 1 1 0
Blade Runner 0 0 0 0 0 0 1 0 0 1 0
Fargo 0 0 0 0 1 1 0 0 0 0 1
Willy Wonka 0 1 1 1 0 0 0 0 0 0 0
Monty Python 0 0 0 1 0 0 0 0 0 0 0
Jaws 1 0 0 0 0 0 0 1 0 0 0
Heat 1 0 0 0 1 0 0 0 0 0 1
Donnie Brasco 0 0 0 0 1 1 0 0 0 0 0
To Kill a Mockingbird 0 0 0 0 0 1 0 0 0 0 0
Similarly logic could be applied to promote more favorable options
• New Releases
• Retail Case: Items that are on-sale, overstock
12. Additional Algorithm – K-Means
“These movies are similar based on their attributes”
• Treats items as coordinates
• Places a number of random
“centroids” and assigns the
nearest items
• Moves the centroids around based
on average location
• Process repeats until the
assignments stop changing
We would use the major attributes of the Movie to create coordinate points.
• Categories
• Actors
• Director
• Synopsis Text
13. Integrating K-Means into the process
Movies recommended by more than 1 algorithm are the most highly rated
K-Means:
Item-Based Similar
Item Similarity
Best
Recommendations
14. Summary
• Mahout and Hadoop can provide a relatively low cost and
extremely scalable platform for recommendations
• Mahout offers a great library of established Machine Learning
libraries, reducing development efforts
• A good recommendation system combines Collaborative and
Content filtering algorithms
elliott@casertaconcepts.com