Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented What Data Do You Have and Where is it?
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Joe Caserta presents his vision of the future of Big Data in the Enterprise.
At the recent Harrisburg University Analytics Summit II, Joe Caserta gave this engaging presentation to Summit attendees including fellow academics, strategists, data scientists and analysts.
How do you balance the need for structured and rule-based governance to assure enterprise data quality - with the imperative to innovate in order to stay relevant and competitive in today's business marketplace?
At the recent CDO Summit in NYC, a range of C-Level Executives across a variety of industries came to hear Joe Caserta, president of Caserta Concepts, put it all in perspective.
Joe talked about the challenges of "data sprawl" and the paradigm shift underway in the evolving big data and data-driven world.
For more information or to contact us, visit http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented What Data Do You Have and Where is it?
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Joe Caserta presents his vision of the future of Big Data in the Enterprise.
At the recent Harrisburg University Analytics Summit II, Joe Caserta gave this engaging presentation to Summit attendees including fellow academics, strategists, data scientists and analysts.
How do you balance the need for structured and rule-based governance to assure enterprise data quality - with the imperative to innovate in order to stay relevant and competitive in today's business marketplace?
At the recent CDO Summit in NYC, a range of C-Level Executives across a variety of industries came to hear Joe Caserta, president of Caserta Concepts, put it all in perspective.
Joe talked about the challenges of "data sprawl" and the paradigm shift underway in the evolving big data and data-driven world.
For more information or to contact us, visit http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
For more information on our services or upcoming events, please visit our website at http://www.casertaconcepts.com/.
The Business Data Lake is a new approach to information management, analytics and reporting that better matches the culture of business and better enables organizations to truly leverage the value of their information.
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
What is Big Data and why it is required and needed for the organization those who really need and generating huge amount of data and when it will be use
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
Caserta Concepts President Joe Caserta featured at Data Governance Winter 2014 Conference with a session on the basic and necessary steps needed for data quality and data governance success
For more information on the event and presentation: http://ow.ly/G3N9N
For more information on the services and solutions offered by Caserta Concepts, visit http://casertaconcepts.com/.
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
This session addresses the first problems of Big Data & Analytics–Identifying, Indexing, Connecting and Gaining Insight of Existing Data to Drive Value. HPE’s Chief Field Technologist will give her perspectives on Enterprise Search as a Fundamental Cornerstone of Building a Data Driven Enterprise.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
For more information on our services or upcoming events, please visit our website at http://www.casertaconcepts.com/.
The Business Data Lake is a new approach to information management, analytics and reporting that better matches the culture of business and better enables organizations to truly leverage the value of their information.
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
What is Big Data and why it is required and needed for the organization those who really need and generating huge amount of data and when it will be use
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
Caserta Concepts President Joe Caserta featured at Data Governance Winter 2014 Conference with a session on the basic and necessary steps needed for data quality and data governance success
For more information on the event and presentation: http://ow.ly/G3N9N
For more information on the services and solutions offered by Caserta Concepts, visit http://casertaconcepts.com/.
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
This session addresses the first problems of Big Data & Analytics–Identifying, Indexing, Connecting and Gaining Insight of Existing Data to Drive Value. HPE’s Chief Field Technologist will give her perspectives on Enterprise Search as a Fundamental Cornerstone of Building a Data Driven Enterprise.
Whether you are interested in healthcare data analytics or looking to get started with big data and marketing, these fundamental principles from data experts will contribute to your success. http://www.qubole.com/new-series-big-data-tips/
Big Data Analytics & Travel Industry – The Best Deal AroundSPEC INDIA
Technologies like Internet of Things - IoT are rapidly integrating with analytical platforms to ensure continual improvements in the travel experiences. With technology being easily accessible, the contemporary traveler expects more from the integration of technologies with enterprises.
Customers will keep looking for innovative as well as competitively priced means for vacations. Big data and analytics services are projected to become the backbone of the travel industry with a marked digital transformation.
Get More Insight on Big Data & the Travel Industry at:
http://blog.spec-india.com/big-data-analytics-services-travel-industry-best-deal-around/
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Hellmuth Broda
While we bemoan the ever increasing data tsunami new technologies allow to harvest the gold nuggets in the hay stack.
Using the example of the Pharmaceutical Industry some of the possible business uses for Big Data Analitics are outlined.
Presentation at Data Summit 2015 in NYC.
Elliott Cordo shared real-world insights across a range of topics, including the evolving best practices for building a data warehouse on Hadoop that also coexists with multiple processing frameworks and additional non-Hadoop storage platforms, the place for massively parallel-processing and relational databases in analytic architectures, and the ways in which the cloud offers the ability to quickly and cost-effectively establish a scalable platform for your Big Data warehouse.
For more information, visit www.casertaconcepts.com
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.
As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.
For more information, visit www.casertaconcepts.com
Workshop with Joe Caserta, President of Caserta Concepts, at Data Summit 2015 in NYC.
Data science, the ability to sift through massive amounts of data to discover hidden patterns and predict future trends and actions, may be considered the "sexiest" job of the 21st century, but it requires an understanding of many elements of data analytics. This workshop introduced basic concepts, such as SQL and NoSQL, MapReduce, Hadoop, data mining, machine learning, and data visualization.
For notes and exercises from this workshop, click here: https://github.com/Caserta-Concepts/ds-workshop.
For more information, visit our website at www.casertaconcepts.com
MT101 Dell OCIO: Delivering data and analytics in real timeDell EMC World
Today’s business operations increasingly rely on sophisticated integration of data streaming across the enterprise. This requires an analytics ecosystem that is highly current and highly available. This session explores the infrastructure and methods Dell IT used for keeping the complex flows, integration processes, BI, and analytics operating 24x7.
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
Watch full webinar here: https://bit.ly/3fpitC3
Enterprise organizations are shifting to self-service analytics as business users need real-time access to holistic and consistent views of data regardless of its location, source or type for arriving at critical decisions.
Data Virtualization and Data Visualization work together through a universal semantic layer. Learn how they enable self-service data discovery and improve performance of your reports and dashboards.
In this session, you will learn:
- Challenges faced by business users
- How data virtualization enables self-service analytics
- Use case and lessons from customer success
- Overview of the highlight features in Tableau
Building a New Platform for Customer Analytics Caserta
Caserta Concepts and Databricks partner up to bring you this insightful webinar on how a business can choose from all of the emerging big data technologies to figure out which one best fits their needs.
What is a Data Warehouse and How Do I Test It?RTTS
ETL Testing: A primer for Testers on Data Warehouses, ETL, Business Intelligence and how to test them.
Are you hearing and reading about Big Data, Enterprise Data Warehouses (EDW), the ETL Process and Business Intelligence (BI)? The software markets for EDW and BI are quickly approaching $22 billion, according to Gartner, and Big Data is growing at an exponential pace.
Are you being tasked to test these environments or would you like to learn about them and be prepared for when you are asked to test them?
RTTS, the Software Quality Experts, provided this groundbreaking webinar, based upon our many years of experience in providing software quality solutions for more than 400 companies.
You will learn the answer to the following questions:
• What is Big Data and what does it mean to me?
• What are the business reasons for a building a Data Warehouse and for using Business Intelligence software?
• How do Data Warehouses, Business Intelligence tools and ETL work from a technical perspective?
• Who are the primary players in this software space?
• How do I test these environments?
• What tools should I use?
This slide deck is geared towards:
QA Testers
Data Architects
Business Analysts
ETL Developers
Operations Teams
Project Managers
...and anyone else who is (a) new to the EDW space, (b) wants to be educated in the business and technical sides and (c) wants to understand how to test them.
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
Transitioning to a Big Data architecture is a big step; and the complexity of moving existing analytical services onto modern platforms like Cloudera, can seem overwhelming.
Predictive Analytics - Big Data Warehousing MeetupCaserta
Predictive analytics has always been about the future, and the age of big data has made that future an increasingly dynamic place, filled with opportunity and risk.
The evolution of advanced analytics technologies and the continual development of new analytical methodologies can help to optimize financial results, enable systems and services based on machine learning, obviate or mitigate fraud and reduce cybersecurity risks, among many other things.
Caserta Concepts, Zementis, and guest speaker from FICO presented the strategies, technologies and use cases driving predictive analytics in a big data environment.
For more information, visit www.casertaconcepts.com or contact us at info@casertaconcepts.com
Five Attributes to a Successful Big Data StrategyPerficient, Inc.
The veracity, variety and sheer volume of data is increasing exponentially. With Hadoop and NoSQL solutions becoming commonplace, there are many technical options for managing and extracting value from this data. Many companies create labs to experiment with Big Data solutions, only later become IT playgrounds or unstructured dumping grounds.
To help avoid these pitfalls,companies with successful Big Data projects approach challenges by formulating a strategy that assures real business value is derived from their Big Data investments. In a Perficient poll, 73% of companies stated they are in the early-evaluation stage to find solutions to their Big Data problems and are only beginning to create their strategy.
Join us for a webinar featuring thought-provoking best practices used by successful companies to quickly realize business value from their Big Data investments. You'll learn:
The top five steps to increased business value
What the top companies are doing in Big Data that you need to know
Next steps to lay the ground work for a successful Big Data strategy
Die Big Data Fabric als Enabler für Machine Learning & AIDenodo
Ansehen: https://bit.ly/2Cet17K
Erstklassige Big Data Fabrics liefern verlässliche Insights, gewährleisten höchste End-to-End Sicherheitsstandards und ermöglichen eine konsistente Datenintegration in Echtzeit – während den Business-Anwendern agile Werkzeuge zum selbstgesteuerten Datenkonsum bereitgestellt werden.
Erfahren Sie in dem Vortrag, wie die Big Data Fabric als Enabler für ML & AI:
- den Business-Anwendern und Data Scientists einen schnellen und agilen Datenzugriff via Self-Services ermöglicht
- Data Governance und Security Richtlinien zentral und verlässlich managebar macht
- relevante Insights aus aktuellen und konsistenten Daten liefert
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
With so many new, evolving frameworks, tools, and languages, a new big data project can lead to confusion and unwarranted risk.
Many organizations have found Data Warehouse Optimization with Hadoop to be a good starting point on their Big Data journey. Offloading ETL workloads from the enterprise data warehouse (EDW) into Hadoop is a well-defined use case that produces tangible results for driving more insights while lowering costs. You gain significant business agility, avoid costly EDW upgrades, and free up EDW capacity for faster queries. This quick win builds credibility and generates savings to reinvest in more Big Data projects.
A proven reference architecture that includes everything you need in a turnkey solution – the Hadoop distribution, data integration software, servers, networking and services – makes it even easier to get started.
Trends in Enterprise Advanced AnalyticsDATAVERSITY
If you missed out on all the trends for 2019 published in
December, or even if you caught some of them, this one merits your time. We’ll be going into 2019 and beyond, since the winners will have an eye on the long view for the source of competitive advantage that is analytics.
It is a fascinating, explosive time for enterprise
analytics.
It is from the position of analytics leadership that the
mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data and projects that will deliver analytics.
After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise data architecture. William will kick off the Advanced Analytics 2019 series with a discussion of the trends winning organizations should build into their plans, expectations, vision and awareness now.
Similar to Big Data's Impact on the Enterprise (20)
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
Joe Caserta provides a statistically-driven model to understanding the customer path to purchase, which combines online, offline and third-party data sources. He shows how customer data is fed to machine learning, which assigns weighted credit to customer interactions in order to give insight to what marketing activities truly matter. This presentation is from Caserta's February 2018 Big Data Warehousing Meetup co-hosted with Databricks.
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
Joe Caserta explores the world of analytics, tech, and AI to paint a picture of where business is headed. This presentation is from the CDAO Exchange in Miami 2018.
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
Looker Presentation
Caserta Concepts, Blue Apron, and Looker are joining forces to recount the journey of an enterprise-level migration to the cloud with exciting new technologies. This is the analytics platform of the future.
Greg Wells shares Caserta's view on Enterprise Cloud Adoption and how this is impacting BI and the data architecture needed to support big data analytics. Daniel Mintz, Looker’s Chief Data Evangelist, presents a comprehensive introduction to Looker, and Jason Jho, Head of Data Engineering, Blue Apron, gives the audience an in-depth look at the environment that is helping Blue Apron achieve success by using data to become an analytics-driven company.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Introducing Kudu, Big Data Warehousing MeetupCaserta
Not just an SQL interface or file system, Kudu - the new, updating column store for Hadoop, is changing the storage landscape. It's easy to operate and makes new data immediately available for analytics or operations.
At the Caserta Concepts Big Data Warehousing Meetup, our guests from Cloudera outlined the functionality of Kudu and talked about why it will become an integral component in big data warehousing on Hadoop.
To learn more about what Caserta Concepts has to offer, visit http://casertaconcepts.com/
During a Big Data Warehousing Meetup in NYC, Elliott Cordo, Chief Architect at Caserta Concepts discussed emerging trends in real time data processing. The presentation included processing frameworks such as Spark and Storm, as well datastore technologies ranging from NoSQL to Hadoop. He also discussed exciting new AWS services such as Lambda, Kenesis, and Kenesis Firehose.
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
Joe Caserta went over the details inside the big data ecosystem and the Caserta Concepts Data Pyramid, which includes Data Ingestion, Data Lake/Data Science Workbench and the Big Data Warehouse. He then dove into the foundation of dimensional data modeling, which is as important as ever in the top tier of the Data Pyramid. Topics covered:
- The 3 grains of Fact Tables
- Modeling the different types of Slowly Changing Dimensions
- Advanced Modeling techniques like Ragged Hierarchies, Bridge Tables, etc.
- ETL Architecture.
He also talked about ModelStorming, a technique used to quickly convert business requirements into an Event Matrix and Dimensional Data Model.
This was a jam-packed abbreviated version of 4 days of rigorous training of these techniques being taught in September by Joe Caserta (Co-Author, with Ralph Kimball, The Data Warehouse ETL Toolkit) and Lawrence Corr (Author, Agile Data Warehouse Design).
For more information, visit http://casertaconcepts.com/.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
3. @joe_Caserta
Launched Big Data practice
Co-author, with Ralph Kimball, The
Data Warehouse ETL Toolkit (Wiley)
Data Analysis, Data Warehousing and
Business Intelligence since 1996
Began consulting database programing
and data modeling 25+ years hands-on experience
building database solutions
Founded
Caserta Concepts
Web log analytics solution published in
Intelligent Enterprise
Launched Data Science, Data
Interaction and Cloud practices
Laser focus on extending Data
Analytics with Big Data solutions
1986
2004
1996
2009
2001
2013
2012
2014
Dedicated to Data Governance
Techniques on Big Data (Innovation)
Top 20 Big Data
Consulting - CIO Review
Top 20 Most Powerful
Big Data consulting firms
Launched Big Data Warehousing
(BDW) Meetup NYC: 2,000+ Members
2015 Awarded for getting data out
of SAP for data analytics
Established best practices for big data
ecosystem implementations
Caserta Timeline
Awarded Top
Healthcare Analytics
Solution Provider
4. @joe_Caserta
About Caserta Concepts
• Technology innovation consulting company with expertise in:
• Big Data Solutions
• Data Warehousing
• Business Intelligence
• Solve highly complex business data challenges
• Award-winning solutions
• Business Transformation
• Maximize Data Value
• Innovation partner:
• Transformative Data Strategies
• Modern Data Engineering
• Advanced Analytics
• Strategic Consulting
• Technical Architecture
• Design and Build Solutions
• Data Science & Analytics
• Data on the Cloud
• Data Interaction & Visualization
6. @joe_Caserta
Caserta Innovation Lab (CIL)
• Internal laboratory established to test & develop solution concepts and
ideas
• Used to accelerate client projects
• Examples:
• Search (SOLR) based BI
• Big Data Governance Toolkit
• Text Analytics on Social Network Data
• Continuous Integration / End-to-end streaming (Spark)
• Recommendation Engine Optimization
• Relationship Intelligence (Graph DB/Search)
• Others (confidential)
• CIL is hosted on
10. @joe_Caserta
Innovation is the only sustainable competitive advantage a company can have
Innovations may fail, but companies that don’t innovate will fail
12. @joe_Caserta
What you need to know (according to Joe)
Hadoop Distribution: Apache, Cloudera, Hortonworks, MapR, IBM
Tools:
Hive: Map data to structures and use SQL-like queries
Pig: Data transformation language for big data
Sqoop: Extracts external sources and loads Hadoop
Spark: General-purpose cluster computing framework
Storm: Real-time ETL
NoSQL:
Document: MongoDB, CouchDB
Graph: Neo4j, Titan
Key Value: Riak, Redis
Columnar: Cassandra, Hbase
Search: Lucene, Solr, ElasticSearch
Languages: Python, Java, R, Scala
13. @joe_Caserta
Enrollments
Claims
Finance
ETL
Ad-Hoc Query
Horizontally Scalable Environment - Optimized for Analytics
Big Data Lake
Canned Reporting
Big Data Analytics
NoSQL
Databases
ETL
Ad-Hoc/Canned
Reporting
Traditional BI
Spark MapReduce Pig/Hive
N1 N2 N4N3 N5
Hadoop Distributed File System (HDFS)
Traditional
EDW
Others…
The Evolution of Modern Data Engineering
Data Science
14. @joe_Caserta
The Progression of Analytics
Descriptive
Analytics
Diagnostic
Analytics
Predictive
Analytics
Prescriptive
Analytics
What
happened?
Why did it
happen?
What will
happen?
How can we make
It happen?
Data Analytics Sophistication
BusinessValue
Source: Gartner
Reports Correlations Predictions Recommendations
15. @joe_Caserta
Governing Big Data
Before Data Governance
Users trying to produce reports from raw source data
No Data Conformance
No Master Data Management
No Data Quality processes
No Trust: Two analysts were almost guaranteed to come up
with two different sets of numbers!
Before Big Data Governance
We can put “anything” in Hadoop
We can analyze anything
We’re scientists, we don’t need IT, we make the rules
Rule #1: Dumping data into Hadoop with no repeatable process, procedure, or
governance will create a mess
Rule #2: Information harvested from an ungoverned systems will take us back to
the old days: No Trust = Not Actionable
16. @joe_Caserta
•This is the ‘people’ part. Establishing Enterprise Data Council,
Data Stewards, etc.Organization
•Definitions, lineage (where does this data come from),
business definitions, technical metadataMetadata
•Identify and control sensitive data, regulatory compliancePrivacy/Security
•Data must be complete and correct. Measure, improve,
certify
Data Quality and
Monitoring
•Policies around data frequency, source availability, etc.Business Process Integration
•Ensure consistent business critical data i.e. Members,
Providers, Agents, etc.Master Data Management
•Data retention, purge schedule, storage/archiving
Information Lifecycle
Management (ILM)
Chief Data Officer – Data Governance
• Add Big Data to overall framework and assign responsibility
• Add data scientists to the Stewardship program
• Assign stewards to new data sets (twitter, call center logs, etc.)
• Graph databases are more flexible than relational
• Lower latency service required
• Distributed data quality and matching algorithms
• Data Quality and Monitoring (probably home grown, drools?)
• Quality checks not only SQL: machine learning, Pig and Map Reduce
• Acting on large dataset quality checks may require distribution
• Larger scale
• New datatypes
• Integrate with Hive Metastore, HCatalog, home grown tables
• Secure and mask multiple data types (not just tabular)
• Deletes are more uncommon (unless there is regulatory requirement)
• Take advantage of compression and archiving (like AWS Glacier)
• Data detection and masking on unstructured data upon ingest
• Near-zero latency, DevOps, Core component of business operations
for Big Data
17. @joe_Caserta
Big
Data
Warehouse
Data Science Workspace
Data Lake – Integrated Sandbox
Landing Area – Source Data in “Full Fidelity”
The Big Data Pyramid
Metadata Catalog
ILM who has access,
how long do we
“manage it”
Raw machine data
collection, collect
everything
Data is ready to be turned into
information: organized, well
defined, complete.
Agile business insight through data-
munging, machine learning, blending
with external data, development of
to-be BDW facts
Metadata Catalog
ILM who has access, how long do we
“manage it”
Data Quality and Monitoring
Monitoring of completeness of data
Metadata Catalog
ILM who has access, how long do we “manage it”
Data Quality and Monitoring Monitoring of
completeness of data
Data has different governance demands at each tier
Only top tier of the pyramid is fully governed
We refer to this as the Trusted tier of the Big Data Warehouse.
Fully Data Governed ( trusted)
User community arbitrary queries and
reporting
21. @joe_Caserta
Implementing Data Governance on Big Data
People, Processes and Business commitment is still critical!
Caution: Some Assembly Required
The V’s require robust tooling:
Unfortunately the toolsets available
are pretty thin: Some of the most
hopeful tools are brand new or in
incubation!
Enterprise big data governance
implementations combine products
with some custom built components
23. @joe_Caserta
High Volume Trade Data Project
• The equity trading arm of a large US bank needed to scale its
infrastructure to enable the ability to process/parse trade data real-time
and calculate aggregations/statistics
~ 1.4Million/second ~12 Billion messages/day ~240 Billon/month
• The solution needed to map the raw data to a data model in memory or
low latency (for real-time), while persisting mapped data to disk (for end
of day reporting).
• The proposed solution also needed to handle
ad-hoc data requests for data analytics.
24. @joe_Caserta
The Data
• Primarily FIX messages: Financial Information Exchange
• Established in early 90's as a standard for trade data communication
widely used throughout the industry
• Basically a delimited file of variable attribute-value pairs
• Looks something like this:
8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 52=20071123-05:30:00.000 |
11=ATOMNOCCC9990900 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 |
44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 |
10=128 |
• A single trade can be comprised of 100's of such messages, although
typical trades have about a dozen
25. @joe_Caserta
Data Quality
Rules Engine
Storm Cluster
Trade
Data
d3.js Real-time
Analytics
Hadoop Cluster
Low Latency
Analytics
Atomic data
Aggregates
Event Monitors
• The Kafka messaging system is used for ingestion
• Storm is used for real-time ETL and outputs atomic data and
derived data needed for analytics
• Redis is used as a reference data lookup cache
• Real time analytics are produced from the aggregated data.
• Higher latency ad-hoc analytics are done in Hadoop using Pig
and Hive
Kafka
High Volume Real-time Analytics
Solution Architecture
26. @joe_Caserta
Electronic Medical Records (EMR) Analytics
Hadoop Data LakeEdge Node
`
100k
files
variant 1..n
…
variant 1..n
HDFS
Put
Netezza
DW
Sqoop
Pig EMR
Processor
UDF
Library
Provider
table
(parquet)
Member table
(parquet)
Python Wrapper
Provider
table
Member
table
Forqlift
Sequenc
e Files
…
variant 1..n
Sequenc
e Files
…
15 More
Entities
(parquet)
More
Dimensions
And
Facts
• Receive Electronic Medial Records from various providers in various formats
• Address Hadoop ‘small file’ problem
• No barrier for onboarding and analysis of new data
• Blend new data with Data Lake and Big Data Warehouse
• Machine Learning
• Text Analytics
• Natural Language Processing
• Reporting
• Ad-hoc queries
• File ingestion
• Information Lifecycle Mgmt
27. @joe_Caserta
Some Thoughts – Enable the Future
Big Data requires the convergence of
data governance, advanced data
engineering, data science and business
smarts
Make sure your data can be trusted
and people can be held accountable for
impact caused by low data quality.
It takes a village to achieve all the tasks
required for effective big data strategy
& execution
Get experts that have done it before!
Achieve the impossible…..
… everything is impossible until someone does is!
28. @joe_Caserta
Workshops: www.casertaconcepts.com/training
Sept 21-22 (2 days), Agile Data Warehousing
taught by Lawrence Corr
Sept 23-24 (2 days), ETL Architecture and Design
taught by Joe Caserta
(Big Data module added)
SAVE $300 BY REGISTERING BEFORE JULY 31ST!
Agile DW & ETL Training in NYC, 2015
We focused our attention on building a single version of the truth
We mainly applied data governance on the EDW itself and a few primary supporting systems –like MDM.
We had a fairly restrictive set of tools for using the EDW data Enterprise BI tools It was easier to GOVERN how the data would be used.
Volume, Variety, Veracity and Veolcity
Spark would make this easier and could leverage same DQ code