In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.
Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.
Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:
-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
Do terms like "Data Lake" confuse you? You’re not alone. With all of the technology buzzwords flying around today, it can become a task to keep up with and clearly understand each of them. However a data lake is definitely something to dedicate the time to understand. Leveraging data lake technology, companies are finally able to keep all of their disparate information and streams of data in one secure location ready for consumption at any time – this includes structured, unstructured, and semi-structured data. For more information on our Big Data Consulting Services, don’t hesitate to visit us online at: http://bit.ly/2fvV5rR
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.
Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.
Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:
-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
Do terms like "Data Lake" confuse you? You’re not alone. With all of the technology buzzwords flying around today, it can become a task to keep up with and clearly understand each of them. However a data lake is definitely something to dedicate the time to understand. Leveraging data lake technology, companies are finally able to keep all of their disparate information and streams of data in one secure location ready for consumption at any time – this includes structured, unstructured, and semi-structured data. For more information on our Big Data Consulting Services, don’t hesitate to visit us online at: http://bit.ly/2fvV5rR
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
Why and How has the Big Data based Enterprise Data Lake solution based on No-SQL and SQL technologies has become significantly effective in solving enterprise data challenges than its predecessor EDW which had tried and failed to solve the same problem entirely based on SQL database only.
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon
Hadoop adoption is a journey. Depending on the business the process can take weeks, months, or even years. Hadoop is a transformative technology so the challenges have less to do with the technology and more to do with how a company adapts itself to a new way of thinking about data. There are challenges for companies who have lived with an application driven business for the last two decades to suddenly become data driven. Companies need to begin thinking less in terms of single, silo’d servers and more about “the cluster”.
The concept of the cluster becomes the center of data gravity drawing all the applications to it. Companies, especially the IT organizations, embark on a process of understanding how to maintain and operationalize this environment and provide the data lake as a service to the businesses. They must empower the business by providing the resources for the use cases which drive both renovation and innovation. IT needs to adopt new technologies and new methodologies which enable the solutions. This is not technology for technology sake. Hadoop is a data platform servicing and enabling all facets of an organization. Building out and expanding this platform is the ongoing journey as word gets out to businesses that they can have any data they want and any time. Success is what drives the journey.
The length of the journey varies from company to company. Sometimes the challenges are based on the size of the company but many times the challenges are based on the difficulty of unseating established IT processes companies have adopted without forethought for the past two decades. Companies must navigate through the noise. Sifting through the noise to find those solutions which bring real value takes time. As the platform matures and becomes mainstream, more and more companies are finding it easier to adopt Hadoop. Hundreds of companies have already taken many steps; hundreds more have already taken the first step. As the wave of successful Hadoop adoption continues, more and more companies will see the value in starting the journey and paving the way for others.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.
As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.
For more information, visit www.casertaconcepts.com
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
View this webinar presentation as CenturyLink Technology Solutions (Formerly Savvis) and MapR as we deconstruct and demystify “the enterprise big data stack.” We provide you with a more holistic view of the landscape, explore use cases to show how you can derive business value from it, and share best practices for navigating through the fragmented big data environment.
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
Come to this deep dive on how Pivotal's Data Lake Vision is evolving by embracing next generation in-memory data exchange and compute technologies around Spark and Tachyon. Did we say Hadoop, SQL, and what's the shortest path to get from past to future state? The next generation of data lake technology will leverage the availability of in-memory processing, with an architecture that supports multiple data analytics workloads within a single environment: SQL, R, Spark, batch and transactional.
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
For more information on our services or upcoming events, please visit our website at http://www.casertaconcepts.com/.
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
This session addresses the first problems of Big Data & Analytics–Identifying, Indexing, Connecting and Gaining Insight of Existing Data to Drive Value. HPE’s Chief Field Technologist will give her perspectives on Enterprise Search as a Fundamental Cornerstone of Building a Data Driven Enterprise.
Data Lake, Virtual Database, or Data Hub - How to Choose?DATAVERSITY
Data integration is just plain hard and there is no magic bullet. That said, three new data integration techniques do ameliorate the misery, making silo-busting possible, if not trivial. The three approaches – data lakes, virtual databases (aka federated databases), and data hubs – are a boon to organizations big enough to have separate systems, separate lines of business, and redundant acquired or COTS data stores. Each approach has its place, but how do you make the right decision about which data silo integration approach to choose and when?
This webinar describes how you can use the key concepts of data Movement, Harmonization, and Indexing to determine what you are giving up or investing in, and make the best decision for your project.
Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations.
In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
An overview of Hadoop and Data warehouse from technologies and business viewpoints. The presentation also includes some of my personal observations and suggestions for people who want to join the field Big Data.
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.
Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.
This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop
Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented What Data Do You Have and Where is it?
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Joe Caserta presents his vision of the future of Big Data in the Enterprise.
At the recent Harrisburg University Analytics Summit II, Joe Caserta gave this engaging presentation to Summit attendees including fellow academics, strategists, data scientists and analysts.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
Why and How has the Big Data based Enterprise Data Lake solution based on No-SQL and SQL technologies has become significantly effective in solving enterprise data challenges than its predecessor EDW which had tried and failed to solve the same problem entirely based on SQL database only.
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon
Hadoop adoption is a journey. Depending on the business the process can take weeks, months, or even years. Hadoop is a transformative technology so the challenges have less to do with the technology and more to do with how a company adapts itself to a new way of thinking about data. There are challenges for companies who have lived with an application driven business for the last two decades to suddenly become data driven. Companies need to begin thinking less in terms of single, silo’d servers and more about “the cluster”.
The concept of the cluster becomes the center of data gravity drawing all the applications to it. Companies, especially the IT organizations, embark on a process of understanding how to maintain and operationalize this environment and provide the data lake as a service to the businesses. They must empower the business by providing the resources for the use cases which drive both renovation and innovation. IT needs to adopt new technologies and new methodologies which enable the solutions. This is not technology for technology sake. Hadoop is a data platform servicing and enabling all facets of an organization. Building out and expanding this platform is the ongoing journey as word gets out to businesses that they can have any data they want and any time. Success is what drives the journey.
The length of the journey varies from company to company. Sometimes the challenges are based on the size of the company but many times the challenges are based on the difficulty of unseating established IT processes companies have adopted without forethought for the past two decades. Companies must navigate through the noise. Sifting through the noise to find those solutions which bring real value takes time. As the platform matures and becomes mainstream, more and more companies are finding it easier to adopt Hadoop. Hundreds of companies have already taken many steps; hundreds more have already taken the first step. As the wave of successful Hadoop adoption continues, more and more companies will see the value in starting the journey and paving the way for others.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.
As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.
For more information, visit www.casertaconcepts.com
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
View this webinar presentation as CenturyLink Technology Solutions (Formerly Savvis) and MapR as we deconstruct and demystify “the enterprise big data stack.” We provide you with a more holistic view of the landscape, explore use cases to show how you can derive business value from it, and share best practices for navigating through the fragmented big data environment.
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
Come to this deep dive on how Pivotal's Data Lake Vision is evolving by embracing next generation in-memory data exchange and compute technologies around Spark and Tachyon. Did we say Hadoop, SQL, and what's the shortest path to get from past to future state? The next generation of data lake technology will leverage the availability of in-memory processing, with an architecture that supports multiple data analytics workloads within a single environment: SQL, R, Spark, batch and transactional.
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
For more information on our services or upcoming events, please visit our website at http://www.casertaconcepts.com/.
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
This session addresses the first problems of Big Data & Analytics–Identifying, Indexing, Connecting and Gaining Insight of Existing Data to Drive Value. HPE’s Chief Field Technologist will give her perspectives on Enterprise Search as a Fundamental Cornerstone of Building a Data Driven Enterprise.
Data Lake, Virtual Database, or Data Hub - How to Choose?DATAVERSITY
Data integration is just plain hard and there is no magic bullet. That said, three new data integration techniques do ameliorate the misery, making silo-busting possible, if not trivial. The three approaches – data lakes, virtual databases (aka federated databases), and data hubs – are a boon to organizations big enough to have separate systems, separate lines of business, and redundant acquired or COTS data stores. Each approach has its place, but how do you make the right decision about which data silo integration approach to choose and when?
This webinar describes how you can use the key concepts of data Movement, Harmonization, and Indexing to determine what you are giving up or investing in, and make the best decision for your project.
Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations.
In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
An overview of Hadoop and Data warehouse from technologies and business viewpoints. The presentation also includes some of my personal observations and suggestions for people who want to join the field Big Data.
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.
Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.
This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop
Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented What Data Do You Have and Where is it?
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Joe Caserta presents his vision of the future of Big Data in the Enterprise.
At the recent Harrisburg University Analytics Summit II, Joe Caserta gave this engaging presentation to Summit attendees including fellow academics, strategists, data scientists and analysts.
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
Workshop with Joe Caserta, President of Caserta Concepts, at Data Summit 2015 in NYC.
Data science, the ability to sift through massive amounts of data to discover hidden patterns and predict future trends and actions, may be considered the "sexiest" job of the 21st century, but it requires an understanding of many elements of data analytics. This workshop introduced basic concepts, such as SQL and NoSQL, MapReduce, Hadoop, data mining, machine learning, and data visualization.
For notes and exercises from this workshop, click here: https://github.com/Caserta-Concepts/ds-workshop.
For more information, visit our website at www.casertaconcepts.com
How do you balance the need for structured and rule-based governance to assure enterprise data quality - with the imperative to innovate in order to stay relevant and competitive in today's business marketplace?
At the recent CDO Summit in NYC, a range of C-Level Executives across a variety of industries came to hear Joe Caserta, president of Caserta Concepts, put it all in perspective.
Joe talked about the challenges of "data sprawl" and the paradigm shift underway in the evolving big data and data-driven world.
For more information or to contact us, visit http://casertaconcepts.com/
Presentation at Data Summit 2015 in NYC.
Elliott Cordo shared real-world insights across a range of topics, including the evolving best practices for building a data warehouse on Hadoop that also coexists with multiple processing frameworks and additional non-Hadoop storage platforms, the place for massively parallel-processing and relational databases in analytic architectures, and the ways in which the cloud offers the ability to quickly and cost-effectively establish a scalable platform for your Big Data warehouse.
For more information, visit www.casertaconcepts.com
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
Caserta Concepts President Joe Caserta featured at Data Governance Winter 2014 Conference with a session on the basic and necessary steps needed for data quality and data governance success
For more information on the event and presentation: http://ow.ly/G3N9N
For more information on the services and solutions offered by Caserta Concepts, visit http://casertaconcepts.com/.
Slides from a recent Big Data Warehousing Meetup titled, Big Data Analytics with Microsoft.
See Power Pivot/ Power Query/ Power View/ Power Maps and Azure Machine Learning be used to analyze Big Data.
One challenge of dealing with Big Data project is to acquire both structured and instructed information in order to find the right correlation. During the event, we explained all the steps to build your model and enhance your existing data through Microsoft's Power BI.
We had an in-depth discussion about the innovations built into the latest stack of Microsoft Business Intelligence, and practical tips from Technology Specialist’s from Microsoft.
The session also featured demos to help you see the technology as an end-to-end solution.
For more information, visit www.casertaconcepts.com
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
The Briefing Room with Dr. Robin Bloor and WhereScape
Live Webcast on April 1, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=7b23b14b532bd7be60a70f6bd5209f03
In the Big Data shuffle, everyone is looking at Hadoop as “the answer” to collect interesting data from a new set of sources. While Hadoop has given organizations the power to gather more information assets than ever before, the question still looms: which data, regardless of source, structure, volume and all the rest, are significant for affecting business value – and how do we harness it? One effective approach is to bolster the data warehouse environment with a solution capable of integrating all the data sources, including Hadoop, and automating delivery of key information into the rights hands.
Register for this episode of The Briefing Room to hear veteran Analyst Robin Bloor as he explains how a rapidly changing information landscape impacts data management. He will be briefed by Mark Budzinski of WhereScape, who will tout his company’s data warehouse automation solutions. Budzinski will discuss how automation can be the cornerstone for closing the gap between those responsible for data management and the people driving business decisions.
Visit InsideAnlaysis.com for more information.
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Building a New Platform for Customer Analytics Caserta
Caserta Concepts and Databricks partner up to bring you this insightful webinar on how a business can choose from all of the emerging big data technologies to figure out which one best fits their needs.
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
It is an exciting and interesting time to be involved in data. More change of influence has occurred in the database management in the last 18 months than has occurred in the last 18 years. New technologies such as NoSQL & Hadoop and radical redesigns of existing technologies, like NewSQL , will change dramatically how we manage data moving forward.
These technologies bring with them possibilities both in terms of the scale of data retained but also in how this data can be utilized as an information asset. The ability to leverage Big Data to drive deep insights will become a key competitive advantage for many organisations in the future.
Join Tony Bain as he takes us through both the high level drivers for the changes in technology, how these are relevant to the enterprise and an overview of the possibilities a Big Data strategy can start to unlock.
La BuzzWord dell’ultimo anno è “Data Science”. Ma cosa significa realmente? Cosa fa un “Data Scientist”? Che strumenti sono messi a disposizione da Microsoft? E che altri strumenti ci sono oltre a Microsoft?
The New Frontier: Optimizing Big Data ExplorationInside Analysis
The Briefing Room with Dr. Robin Bloor and Cirro
Live Webcast on February 11, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=0ec1fa381886313cc06d841015c65898
As information ecosystems continue to expand, businesses are searching for ways to combine traditional analytics with a new source of insight: Big Data. But with data flooding in from all kinds of sources, fast access and performance at scale can easily become an issue. One effective approach for solving this challenge is data federation, a method that involves taking the analytical processing to the data, allowing streamlined access to multiple data sources without the expensive ETL overhead or building of semantic layers.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains how the prevalence of distributed data calls for a new approach to Big Data. He will be briefed by Mark Theissen of Cirro, who will tout his company’s Data Hub, a data federation solution that provides a single point of access to all enterprise data assets without excessive data movements, preprocessing or staging. He will discuss how data federation differs from virtualization and ETL approaches, and demonstrate how a Cirro deployment solves the analytics challenge of integrating data silos across the data center – and the cloud – using the BI tools you already have on your desktop for real-time distributed analytics.
Visit InsideAnlaysis.com for more information.
Similar to Big Data: Setting Up the Big Data Lake (20)
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
Joe Caserta provides a statistically-driven model to understanding the customer path to purchase, which combines online, offline and third-party data sources. He shows how customer data is fed to machine learning, which assigns weighted credit to customer interactions in order to give insight to what marketing activities truly matter. This presentation is from Caserta's February 2018 Big Data Warehousing Meetup co-hosted with Databricks.
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
Joe Caserta explores the world of analytics, tech, and AI to paint a picture of where business is headed. This presentation is from the CDAO Exchange in Miami 2018.
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
Looker Presentation
Caserta Concepts, Blue Apron, and Looker are joining forces to recount the journey of an enterprise-level migration to the cloud with exciting new technologies. This is the analytics platform of the future.
Greg Wells shares Caserta's view on Enterprise Cloud Adoption and how this is impacting BI and the data architecture needed to support big data analytics. Daniel Mintz, Looker’s Chief Data Evangelist, presents a comprehensive introduction to Looker, and Jason Jho, Head of Data Engineering, Blue Apron, gives the audience an in-depth look at the environment that is helping Blue Apron achieve success by using data to become an analytics-driven company.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Introducing Kudu, Big Data Warehousing MeetupCaserta
Not just an SQL interface or file system, Kudu - the new, updating column store for Hadoop, is changing the storage landscape. It's easy to operate and makes new data immediately available for analytics or operations.
At the Caserta Concepts Big Data Warehousing Meetup, our guests from Cloudera outlined the functionality of Kudu and talked about why it will become an integral component in big data warehousing on Hadoop.
To learn more about what Caserta Concepts has to offer, visit http://casertaconcepts.com/
During a Big Data Warehousing Meetup in NYC, Elliott Cordo, Chief Architect at Caserta Concepts discussed emerging trends in real time data processing. The presentation included processing frameworks such as Spark and Storm, as well datastore technologies ranging from NoSQL to Hadoop. He also discussed exciting new AWS services such as Lambda, Kenesis, and Kenesis Firehose.
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
Joe Caserta went over the details inside the big data ecosystem and the Caserta Concepts Data Pyramid, which includes Data Ingestion, Data Lake/Data Science Workbench and the Big Data Warehouse. He then dove into the foundation of dimensional data modeling, which is as important as ever in the top tier of the Data Pyramid. Topics covered:
- The 3 grains of Fact Tables
- Modeling the different types of Slowly Changing Dimensions
- Advanced Modeling techniques like Ragged Hierarchies, Bridge Tables, etc.
- ETL Architecture.
He also talked about ModelStorming, a technique used to quickly convert business requirements into an Event Matrix and Dimensional Data Model.
This was a jam-packed abbreviated version of 4 days of rigorous training of these techniques being taught in September by Joe Caserta (Co-Author, with Ralph Kimball, The Data Warehouse ETL Toolkit) and Lawrence Corr (Author, Agile Data Warehouse Design).
For more information, visit http://casertaconcepts.com/.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
3. @joe_Caserta
Launched Big Data practice
Co-author, with Ralph Kimball, The
Data Warehouse ETL Toolkit
Data Analysis, Data Warehousing and
Business Intelligence since 1996
Began consulting database programing
and data modeling 25+ years hands-on experience
building database solutions
Founded
Caserta Concepts
Web log analytics solution published in
Intelligent Enterprise
Launched Data Science, Data
Interaction and Cloud practices
Laser focus on extending Data
Analytics with Big Data solutions
1986
2004
1996
2009
2001
2013
2012
2014
Dedicated to Data Governance
Techniques on Big Data (Innovation)
Top 20 Big Data
Consulting - CIO Review
Top 20 Most Powerful
Big Data consulting firms
Launched Big Data Warehousing
(BDW) Meetup NYC: 3,000+ Members
2015 Awarded for getting data out
of SAP for data analytics
Established best practices for big data
ecosystem implementations
Caserta Timeline
Awarded Top
Healthcare Analytics
Solution Provider
4. @joe_Caserta
About Caserta Concepts
• Consulting firm with focused expertise on Data Innovation, using Modern Data
Engineering approaches to solve highly complex business data challenges
• Award-winning company
• Internationally recognized work force
• Mentoring, Training, Knowledge Transfer
• Strategy, Architecture, Implementation
• An Innovation Partner
• Transformative Data Strategies
• Modern Data Engineering
• Advanced Architecture
• Leaders in architecting and implementing enterprise data solutions
• Data Warehousing
• Business Intelligence
• Big Data Analytics
• Data Science
• Data on the Cloud
• Data Interaction & Visualization
• Strategic Consulting
• Technical Design
• Build & Deploy Solutions
8. @joe_Caserta
Caserta Innovation Lab (CIL)
• Internal laboratory established to test & develop solution concepts and
ideas
• Used to accelerate client projects
• Examples:
• Search (SOLR) based BI
• Big Data Governance Toolkit
• Text Analytics on Social Network Data
• Continuous Integration / End-to-end streaming (Spark)
• Recommendation Engine Optimization
• Relationship Intelligence (Graph DB/Search)
• Others (confidential)
• CIL is hosted on
10. @joe_Caserta
As a Mindful Cyborg, Chris utilizes up to 700 sensors, devices, applications, and
services to track, analyze, and optimize as many areas of his existence.
This quantification enables him to see the connections of otherwise invisible data,
resulting in dramatic upgrades to his health, productivity, and quality of life.
The Future is Today
11. @joe_Caserta
The Progression of Data Analytics
Descriptive
Analytics
Diagnostic
Analytics
Predictive
Analytics
Prescriptive
Analytics
What
happened?
Why did it
happen?
What will
happen?
How can we make
It happen?
Data Analytics Sophistication
BusinessValue
Source: Gartner
Reports Correlations Predictions Recommendations
Cognitive Computing / Cognitive Data Analytics
12. @joe_Caserta
Innovation is the only sustainable competitive advantage a company can have
Innovations may fail, but companies that don’t innovate will fail
14. @joe_Caserta
What you need to know (according to Joe)
Hadoop Distribution: Apache, Cloudera, Hortonworks, MapR, IBM
Tools:
Hive: Map data to structures and use SQL-like queries
Pig: Data transformation language for big data
Sqoop: Extracts external sources and loads Hadoop
Storm: Real-time ETL
Spark: General-purpose cluster computing framework
NoSQL:
Document: MongoDB, CouchDB
Graph: Neo4j, Titan
Key Value: Riak, Redis
Columnar: Cassandra, Hbase
Search: Lucene, Solr, ElasticSearch
Languages: Python, Java, R, Scala
15. @joe_Caserta
Enrollments
Claims
Finance
ETL
Ad-Hoc Query
Horizontally Scalable Environment - Optimized for Analytics
Big Data Lake
Canned Reporting
Big Data Analytics
NoSQL
Databases
ETL
Ad-Hoc/Canned
Reporting
Traditional BI
Spark MapReduce Pig/Hive
N1 N2 N4N3 N5
Hadoop Distributed File System (HDFS)
Traditional
EDW
Others…
The Evolution of Modern Data Engineering
Data Science
16. @joe_Caserta
How We’ve Built Data Warehouses
•Design – Top Down / Bottom Up
• Customer Interviews and requirements gathering
• Data Profiling
•Extract Transform Load data from source to data
warehouse
•Create Facts and Dimensions
•Put a BI tool on top
•Develop reports
•Data Governance
17. @joe_Caserta
The Traditional Conversation
• Kimball Vs. Inmon
• Dimensional vs. 3rd Normal Form
• What hardware do we need (that will be ready in 6 months)
• Oracle vs SQL Server, Postgres or MySQL if we were brave
(and cheap)
• Which ETL tool should we BUY Informatica, Datastage?
• Which BI tool should we sit on top Business Objects,
Cognos?
18. @joe_Caserta
The New Conversation
• Do we need a Data Warehouse at all?
• If we do, does it need to be relational?
• Should we leverage Hadoop or NoSQL?
• Which platform and language are we going to code in?
• Which bleeding edge Apache Project should we put in
production!
19. @joe_Caserta
Why Change?
New technologies are great and all.. But what drives our
adoption of new technologies and techniques?
• Data has changed – Semistructured, Unstructured, Sparse
and evolving schema
• Volumes have changed GB to TB to PB workloads
• Cracks in the Armor of Traditional Data Warehousing
approach!
AND MOST IMPORTANTLY:
Companies that innovate to leverage their data win!
20. @joe_Caserta
Cracks in the Data Warehouse Armor
• Onboarding new data is difficult!
• Data structures are rigid!
• Data Governance is slow!
• Disconnected from business needs:
“Hey – I need to munge some new data to see if it has value”
Wait! We have to….
Profile, analyze and conform the data
Change data models and load it into dimensional models
Build a semantic layer – that nobody is going to use
Create a dashboard we hope someone will notice
..and then you can have at it 3-6 months later to see if it has value!
21. @joe_Caserta
Is Anyone Surprised?
DWs have 70% FAILURE RATE
• Semi-scientific analysis has proven the majority of data analytic
projects fail..
• And of those that don’t fail, only a fraction are deemed a
“success”, others just finish!
• Data is just REALLY hard, especially without the right strategy
What do we think the Data Governance failure rate is?
22. @joe_Caserta
Is Traditional Warehousing All Wrong?
NO!
The concept of a Data Warehouse is sound:
•Consolidating data from disparate source systems
•Clean and conformed reference data
•Clean and integrated business facts
•Data governance (a more pragmatic version)
We can be more successful by acknowledging the
EDW can’t solve all problems.
23. @joe_Caserta
So what’s missing?
The Data Lake
A storage and processing layer for all data
• Store anything: source data, semi-structured,
unstructured, structured
• Keep it as long as needed
• Support a number of processing workloads
• Scale-out
..and here is where Hadoop
can help us!
24. @joe_Caserta
Hadoop (Typically) Powers the Data Lake
Hadoop Provides us:
• Distributed storage HDFS
• Resource Management YARN
• Many workloads, not just Map Reduce
25. @joe_Caserta
Governing Big Data
Before Data Governance
Users trying to produce reports from raw source data
No Data Conformance
No Master Data Management
No Data Quality processes
No Trust: Two analysts were almost guaranteed to come up
with two different sets of numbers!
Before Big Data Governance
We can put “anything” in Hadoop
We can analyze anything
We’re scientists, we don’t need IT, we make the rules
Rule #1: Dumping data into Hadoop with no repeatable process, procedure, or
governance will create a mess
Rule #2: Information harvested from an ungoverned systems will take us back to
the old days: No Trust = Not Actionable
26. @joe_Caserta
•This is the ‘people’ part. Establishing Enterprise Data Council,
Data Stewards, etc.Organization
•Definitions, lineage (where does this data come from),
business definitions, technical metadataMetadata
•Identify and control sensitive data, regulatory compliancePrivacy/Security
•Data must be complete and correct. Measure, improve,
certify
Data Quality and
Monitoring
•Policies around data frequency, source availability, etc.Business Process Integration
•Ensure consistent business critical data i.e. Members,
Providers, Agents, etc.Master Data Management
•Data retention, purge schedule, storage/archiving
Information Lifecycle
Management (ILM)
Data Governance
• Add Big Data to overall framework and assign responsibility
• Add data scientists to the Stewardship program
• Assign stewards to new data sets (twitter, call center logs, etc.)
• Graph databases are more flexible than relational
• Lower latency service required
• Distributed data quality and matching algorithms
• Data Quality and Monitoring (probably home grown, drools?)
• Quality checks not only SQL: machine learning, Pig and Map Reduce
• Acting on large dataset quality checks may require distribution
• Larger scale
• New datatypes
• Integrate with Hive Metastore, HCatalog, home grown tables
• Secure and mask multiple data types (not just tabular)
• Deletes are more uncommon (unless there is regulatory requirement)
• Take advantage of compression and archiving (like AWS Glacier)
• Data detection and masking on unstructured data upon ingest
• Near-zero latency, DevOps, Core component of business operations
for Big Data
27. @joe_Caserta
Making it Right
The promise is an “agile” data culture where communities of users are encouraged
to explore new datasets in new ways
New tools
External data
Data blending
Decentralization
With all the V’s, data scientists, new tools, new data we must rely LESS on HUMANS
We need more systemic administration
We need systems, tools to help with big data governance
This space is EXTREMELY immature!
Steps towards Data Governance for the Data Lake
1. Establish difference between traditional data and big data governance
2. Establish basic rules for where new data governance can be applied
3. Establish processes for graduating the products of data science to
governance
4. Establish a set of tools to make governing Big Data feasible
29. @joe_Caserta
Data Lake Governance Realities
Full data governance can only be applied to “Structured” data
The data must have a known and well documented schema
This can include materialized endpoints such as files or tables OR
projections such as a Hive table
Governed structured data must have:
A known schema with Metadata
A known and certified lineage
A monitored, quality test, managed process for ingestion and
transformation
A governed usage Data isn’t just for enterprise BI tools anymore
We talk about unstructured data in Hadoop but more-so it’s semi-
structured/structured with a definable schema.
Even in the case of unstructured data, structure must be
extracted/applied in just about every case imaginable before analysis
can be performed.
32. @joe_Caserta
The Data Scientists Can Help!
Data Science to Big Data Warehouse mapping
Full Data Governance Requirements
Provide full process lineage
Data certification process by data stewards and business owners
Ongoing Data Quality monitoring that includes Quality Checks
Provide requirements for Data Lake
Proper metadata established:
Catalog
Data Definitions
Lineage
Quality monitoring
Know and validate data
completeness
33. @joe_Caserta
Big
Data
Warehouse
Data Science Workspace
Data Lake – Integrated Sandbox
Landing Area – Source Data in “Full Fidelity”
The Big Data Pyramid
Metadata Catalog
ILM who has access,
how long do we
“manage it”
Raw machine data
collection, collect
everything
Data is ready to be turned into
information: organized, well
defined, complete.
Agile business insight through data-
munging, machine learning, blending
with external data, development of
to-be BDW facts
Metadata Catalog
ILM who has access, how long do we
“manage it”
Data Quality and Monitoring
Monitoring of completeness of data
Metadata Catalog
ILM who has access, how long do we “manage it”
Data Quality and Monitoring Monitoring of
completeness of data
Data has different governance demands at each tier
Only top tier of the pyramid is fully governed
We refer to this as the Trusted tier of the Big Data Warehouse.
Fully Data Governed ( trusted)
User community arbitrary queries and
reporting
Usage Pattern Data Governance
34. @joe_Caserta
Peeling back the layer… The Landing Area
•Source data in it’s full fidelity
•Programmatically Loaded
•Partitioned for data processing
•No governance other than catalog and ILM (Security
and Retention)
•Consumers: Data Scientists, ETL Processes,
Applications
35. @joe_Caserta
Data Lake
•Enriched, lightly integrated
•Data has been is accessible in the Hive Metastore
• Either processed into tabular relations
• Or via Hive Serdes directly upon Raw Data
•Partitioned for data access
•Governance additionally includes a guarantee of
completeness
•Consumers: Data Scientists, ETL Processes,
Applications, Data Analysts
36. @joe_Caserta
A Note On Unstructured Data
• A Structure must be extracted/applied in just about every
case imaginable before analysis can be performed.
• Full data governance can only be applied to “Structured”
data
• This can include materialized endpoints such as files or
tables OR projections such as a Hive table
• Governed structured data must have:
• A known schema with Metadata
• A known and certified lineage
• A monitored, quality test, managed process for ingestion
and transformation
37. @joe_Caserta
Data Science Workspace
•No barrier for onboarding and analysis of new data
•Blending of new data with entire Data Lake,
including the Big Data Warehouse
•Data Scientists enrich data with insight
•Consumers: Data Scientists (cool cats) only!
38. @joe_Caserta
Big Data Warehouse
•Data is Fully Governed
•Data is Structured
•Partitioned/tuned for data access
•Governance includes a guarantee of completeness
and accuracy
•Consumers: Data Scientists, ETL Processes,
Applications, Data Analysts, and Business Users
(the masses)
Big
Data
Warehouse
39. @joe_Caserta
The Refinery
BDW
Data Science
Workspace
Data Lake
Landing Area
Cool
new
data
New
Insights
•The feedback loop between Data Science and Data
Warehouse is critical
•Successful work products of science must Graduate
into the appropriate layers of the Data Lake
40. @joe_Caserta
Big Data Warehouse Technology?
“Polyglot Persistence - where any decent sized
enterprise will have a variety of different data
storage technologies for different kinds of data.
There will still be large amounts of it managed in
relational stores, but increasingly we'll be first asking
how we want to manipulate the data and only then
figuring out what technology is the best bet for it…”
- Martin Fowler (http://martinfowler.com)
Abridged Version: Use the right tool for the job!
41. @joe_Caserta
Polyglot Warehouse
We promote the concept that the Big Data
Warehouse may live in one or more platforms
•Full Hadoop Solutions
•Hadoop plus MPP or Relational
Supplemental technologies:
•NoSQL: Columnar, Key value, Timeseries, Graph
•Search Technologies
42. @joe_Caserta
Hadoop is the Data Warehouse?
•Hadoop can be the entire data pyramid platform for
including landing, data lake and the Big Data
Warehouse
•Especially serves as the Data Lake and “Refinery”
•Query engines such as Hive, and Impala provide SQL
support
43. @joe_Caserta
More Typical: Hadoop + Relational
•Hadoop is the platform for the Data Lake and
Refinery
•The Active Set is federated out into MPP or
Relational Platforms Presentation Layer
•Serves as a good model when there is existing MPP
or Relational Data Warehouse in place
44. @joe_Caserta
On the Cloud
AWS and other cloud providers present a very
powerful design pattern:
•S3 serves as the storage layer for the Data Lake
•EMR (Elastic Hadoop) provides the Refinery, most
clusters can be ephemeral
•The Active Set is stored into Redshift MPP or
Relational Platforms
Eliminate massive on premise appliance footprint
45. @joe_Caserta
Data Warehousing is not Dead!
• The principles of Data Warehousing still makes sense
• Recognize gaps in feature/functionality of the Relational
Database, and traditional Data Warehousing
• Believe in the Data Lake and accept Tunable Governance
• Think Polyglot Warehouse and use the right tool for the job
46. @joe_Caserta
What skills are needed?
Modern Data
Engineering/Data
Preparation
Domain
Knowledge/Business
Expertise
Advanced
Mathematics/
Statistics
47. @joe_Caserta
What about the tools I have?
People, Processes and Business commitment is still critical!
Caution: Some Assembly Required
The V’s require robust tooling:
Some of the most hopeful tools are
brand new or in incubation!
Enterprise big data implementations
typically combine products with some
custom built components
49. @joe_Caserta
High Volume Trade Data Project
• The equity trading arm of a large US bank needed to scale its
infrastructure to enable the ability to process/parse trade data real-time
and calculate aggregations/statistics
~ 1.4Million/second ~12 Billion messages/day ~240 Billon/month
• The solution needed to map the raw data to a data model in memory or
low latency (for real-time), while persisting mapped data to disk (for end
of day reporting).
• The proposed solution also needed to handle
ad-hoc data requests for data analytics.
50. @joe_Caserta
The Data
• Primarily FIX messages: Financial Information Exchange
• Established in early 90's as a standard for trade data communication
widely used throughout the industry
• Basically a delimited file of variable attribute-value pairs
• Looks something like this:
8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 52=20071123-05:30:00.000 |
11=ATOMNOCCC9990900 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 |
44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 |
10=128 |
• A single trade can be comprised of 100's of such messages, although
typical trades have about a dozen
51. @joe_Caserta
Data Quality
Rules Engine
Storm Cluster
Trade
Data
d3.js Real-time
Analytics
Hadoop Cluster
Low Latency
Analytics
Atomic data
Aggregates
Event Monitors
• The Kafka messaging system is used for ingestion
• Storm is used for real-time ETL and outputs atomic data and
derived data needed for analytics
• Redis is used as a reference data lookup cache
• Real time analytics are produced from the aggregated data.
• Higher latency ad-hoc analytics are done in Hadoop using Pig
and Hive
Kafka
High Volume Real-time Analytics
Solution Architecture
52. @joe_Caserta
Electronic Medical Records (EMR) Analytics
Hadoop Data LakeEdge Node
`
100k
files
variant 1..n
…
variant 1..n
HDFS
Put
Netezza
DW
Sqoop
Pig EMR
Processor
UDF
Library
Provider
table
(parquet)
Member table
(parquet)
Python Wrapper
Provider
table
Member
table
Forqlift
Sequenc
e Files
…
variant 1..n
Sequenc
e Files
…
15 More
Entities
(parquet)
More
Dimensions
And
Facts
• Receive Electronic Medial Records from various providers in various formats
• Address Hadoop ‘small file’ problem
• No barrier for onboarding and analysis of new data
• Blend new data with Data Lake and Big Data Warehouse
• Machine Learning
• Text Analytics
• Natural Language Processing
• Reporting
• Ad-hoc queries
• File ingestion
• Information Lifecycle Mgmt
53. @joe_Caserta
Some Thoughts – Enable the Future
Big Data requires the convergence of
data governance, advanced data
engineering, data science and business
smarts
Make sure your data can be trusted
and people can be held accountable for
impact caused by low data quality.
It takes a village to achieve all the tasks
required for effective big data strategy
& execution
Get experts that have done it before!
Achieve the impossible…..
… everything is impossible until someone does it!
54. @joe_Caserta
Workshops: www.casertaconcepts.com/training
Sept 21-22 (2 days), Agile Data Warehousing
taught by Lawrence Corr
Sept 23-24 (2 days), ETL Architecture and Design
taught by Joe Caserta
(Big Data module added)
SAVE $300 by using discount code: DAMANYC
Agile DW & ETL Training in NYC, 2015
New York Executive Conference Center
1601 Broadway @48th St.
New York, NY 10019
Last 2 years have been more exciting than previous 27
We focused our attention on building a single version of the truth
We mainly applied data governance on the EDW itself and a few primary supporting systems –like MDM.
We had a fairly restrictive set of tools for using the EDW data Enterprise BI tools It was easier to GOVERN how the data would be used.
Volume, Variety, Veracity and Veolcity
Spark would make this easier and could leverage same DQ code