(Presented at MapR's Big Data Everywhere event in Redwood City, CA in December 2016)
The relationship between business teams and IT has changed as the complexity of data has increased. A traditional data pipeline designed for an IT-centered approach to information management is not designed for the data demands of today's business decisions. Designing a big data strategy requires modernizing previous approaches. Self-service data preparation in a collaborative, intuitive, governed, and secure environment is the key to a nimble and decisive business unit.
The promise of self-service analytics asserts that business users should be empowered make data-driven decisions quickly without having to involve the analytics team, while critics say that it could lead to faulty choices. In this presentation we’ll cover topics such as acknowledging diverse customer needs, choosing the right tools, understanding the pitfalls, and considering the future of self-service analytics. And cake.
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Data Con LA
In this interactive panel discussion, you will hear from these Spark experts as to why they chose to go "all-in" on Spark, leveraging the rich core capabilities that make Spark so exciting, and committing to significant IP that turns Spark into a world-class enterprise data preparation engine.
Raymond and David will explain specific cases where capabilities were built on top of core Spark to provide a true interactive data prep application experience. Innovations such as creating a Domain Specific Language (DSL), an optimizing compiler, a persistent columnar caching layer, application specific Resilient Distributed Datasets (RDDs), on-line aggregation operators to solve the core memory, pipelining and shuffling obstacles to produce a highly interactive application with the core user and data volume scale-out benefits of Spark.
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
Joe Caserta explores the world of analytics, tech, and AI to paint a picture of where business is headed. This presentation is from the CDAO Exchange in Miami 2018.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
Joe Caserta provides a statistically-driven model to understanding the customer path to purchase, which combines online, offline and third-party data sources. He shows how customer data is fed to machine learning, which assigns weighted credit to customer interactions in order to give insight to what marketing activities truly matter. This presentation is from Caserta's February 2018 Big Data Warehousing Meetup co-hosted with Databricks.
The promise of self-service analytics asserts that business users should be empowered make data-driven decisions quickly without having to involve the analytics team, while critics say that it could lead to faulty choices. In this presentation we’ll cover topics such as acknowledging diverse customer needs, choosing the right tools, understanding the pitfalls, and considering the future of self-service analytics. And cake.
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Data Con LA
In this interactive panel discussion, you will hear from these Spark experts as to why they chose to go "all-in" on Spark, leveraging the rich core capabilities that make Spark so exciting, and committing to significant IP that turns Spark into a world-class enterprise data preparation engine.
Raymond and David will explain specific cases where capabilities were built on top of core Spark to provide a true interactive data prep application experience. Innovations such as creating a Domain Specific Language (DSL), an optimizing compiler, a persistent columnar caching layer, application specific Resilient Distributed Datasets (RDDs), on-line aggregation operators to solve the core memory, pipelining and shuffling obstacles to produce a highly interactive application with the core user and data volume scale-out benefits of Spark.
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
Joe Caserta explores the world of analytics, tech, and AI to paint a picture of where business is headed. This presentation is from the CDAO Exchange in Miami 2018.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
Joe Caserta provides a statistically-driven model to understanding the customer path to purchase, which combines online, offline and third-party data sources. He shows how customer data is fed to machine learning, which assigns weighted credit to customer interactions in order to give insight to what marketing activities truly matter. This presentation is from Caserta's February 2018 Big Data Warehousing Meetup co-hosted with Databricks.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
Focus on Your Analysis, Not Your SQL CodeDATAVERSITY
Analysts in the line of business deal with a myriad of time-consuming data preparation and analytic challenges that often require IT or DBA intervention to deliver a requested dataset. Others have taught themselves “enough SQL to be dangerous”, learning the necessary code to extract the data needed to answer their business question. Self-service data analytics empowers these business analysts to take control of the entire analytics process, delivering the necessary results for better business decisions.
Join us to learn how self-service data analytics allows analysts to:
- Utilize a drag-and-drop workflow for data and analytic processes without writing code
- Minimize data movement and ensure data integrity through in-database capabilities
- Easily work across relational and non-relational databases to deliver faster business results
Self-service data analytics delivers a repeatable process that is transparent to not only business analysts, but also SQL coders and decision makers across the organization.
Smarter businesses apply AI to learn and continuously evolve the way they work. To extract full value from AI, companies need data strategy that gives them access to all their data – no matter where it lives – in an environment that easily scales and applies the latest discovery technology including advanced analytics, visualization and AI. Learn how IBM Watson and Data provides all the tools companies need to embed AI, machine learning and deep learning in their business, while enabling professionals to gain the most from their data to drive smarter business and lead industry-changing transformations.
Building a New Platform for Customer Analytics Caserta
Caserta Concepts and Databricks partner up to bring you this insightful webinar on how a business can choose from all of the emerging big data technologies to figure out which one best fits their needs.
Data Catalog as the Platform for Data IntelligenceAlation
Data catalogs are in wide use today across hundreds of enterprises as a means to help data scientists and business analysts find and collaboratively analyze data. Over the past several years, customers have increasingly used data catalogs in applications beyond their search & discovery roots, addressing new use cases such as data governance, cloud data migration, and digital transformation. In this session, the founder and CEO of Alation will discuss the evolution of the data catalog, the many ways in which data catalogs are being used today, the importance of machine learning in data catalogs, and discuss the future of the data catalog as a platform for a broad range of data intelligence solutions.
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
Marketing is all about attracting, retaining and building profitable relationships with your customers, but how do you know which customers to target, which campaigns to run, and which marketing programs to invest in, to get most return for your dollar?
Join Alteryx and Keyrus as we demonstrate how to combine all relevant marketing, sales and customer data, and perform sophisticated analytics to deepen customer insight and calculate ROI of marketing programs.
You’ll walk away knowing how to:
Segment and profile your customers – take that raw data and translate it into real value
Build a marketing attribution model within Alteryx, creating a personal answer engine for your company.
Leverage R or Python code in an Alteryx workflow so data scientists can collaborate with non-coding stake holders in a code-friendly and code-free environment.
Join Alteryx and Keyrus and get the actionable insights you need to drive marketing ROI analytics, and answer million-dollar questions without spending millions of dollars on standardized solutions.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Data lineage to drive compliance and as a business imperativeLeigh Hill
The importance of data lineage has escalated in recent years in response to regulatory demand and increased business understanding of the benefits it can deliver. Like all capital markets technology, data lineage presents both challenges and opportunities, so how best can it be implemented and sustained? And how can your organisation reap the rewards of successful implementation?
This webinar will outline data lineage, its progress towards automation, and why it is so important from both a regulatory and business perspective. It will also provide advice on how to select a solution and step-by-step guidance on how to implement and integrate data lineage. Finally, the webinar will discuss how to manage data lineage to ensure regulatory compliance, deliver business benefits and plan for the future.
Register for the webinar to find out more about:
-The importance of data lineage in capital markets
-How to select a solution for your organisation
-Approaches to implementation and integration
-How to achieve sustainable regulatory compliance
-The business benefits of successful implementation
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
According to Forrester Research, only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are high-cost IT projects, local applications that are not built to scale for production workflows, or laptop decision support projects that never impact customers. Despite this high failure rate, we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models, but not many people talk about getting them into production where they can impact customers.
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean, and DevOps. However, DataOps is not just about shipping working machine learning models; it starts with better alignment of data science with the rest of the organization and its goals. Harvinder shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction, self-service tools giving data scientists freedom from bottlenecks, and more. The DataOps methodology will enable you to eliminate daily barriers, putting your data scientists in control of delivering ever-faster cutting-edge innovation for your organization and customers.
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...DATAVERSITY
J.B. Hunt, one of the leading providers of transportation and logistics services in North America, recognizes the criticality of customer responsiveness, service quality, and operational efficiency for its success. However, with its data spread across multiple sources, including legacy mainframe systems, the organization was struggling to meet data requirements from multiple departments. They struggled to troubleshoot operational issues and respond to customers quickly.
Join this webinar to hear about the optimized solution J. B. Hunt implemented, which automates real-time data pipelines for a reliable cloud data lake and provides multiple user groups an in-the-moment view of data without overwhelming internal operational systems. Discover how J.B. Hunt now leverages a modernized data environment to accelerate data delivery and drive various AI and analytics initiatives such as real-time service-pricing, competitive counterbidding, and improving their customer experience.
Learn how you can:
• Ingest data in real-time from legacy mainframe systems, enterprise applications, and more
• Create a reliable cloud data lake to accelerate AI and Analytic Initiatives
• Catalog, prepare, and provision data to empower data consumers
• Drive operational efficiency and customer experience with AI-augmented insights
Chief Data & Analytics Officer Fall Boston - PresentationSrinivasan Sankar
Data Asset Catalog & Metadata Management - Is It a Fad or Is It the Future?
Many have dubbed metadata as “the new black,” but is this accurate?
How to leverage metadata management to streamline data governance and ensure transparency
Improving data quality and ensuring consistency and accuracy of data across various reporting systems
Looking at the flip side: what are the additional training requirements and value-added for the business?
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
Modern data analysis is moving beyond the Data Warehouse to the Data Lake where analysts are able to take advantage of emerging technologies to manage complex analytics on large data volumes and diverse data types. Yet, for some business problems, a Data Warehouse may still be the right solution.
If you’re on the fence, join this webinar as we compare and contrast Data Lakes and Data Warehouses, identifying situations where one approach may be better than the other and highlighting how the two can work together.
Get tips, takeaways and best practices about:
- The benefits and problems of a Data Warehouse
- How a Data Lake can solve the problems of a Data Warehouse
- Data Lake Architecture
- How Data Warehouses and Data Lakes can work together
Managing uncertainty in data - Presentation at Data Science Northeast Netherl...University of Twente
Managing uncertainty in data: the key to effective management of data quality problems
Business analytics and data science are significantly impaired by a wide variety of 'data handling' issues, especially when data from different sources are combined and when unstructured data is involved. The root cause of many such problems centers around data semantics and data quality. We have developed a generic method which is based on modeling such problems as uncertainty *in* the data. A recently conceived new kind of DBMS can store, manage, and query large volumes of uncertain data: the UDBMS or "Uncertain Database". Together, they allow one to, e.g., postpone the resolution of data problems, assess what their influence is on analytical results, etc. We furthermore develop technology for data cleansing, web harvesting, and natural language processing which uses this method to deal with ambiguity of natural language and many other problems encountered when using unstructured data.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
Focus on Your Analysis, Not Your SQL CodeDATAVERSITY
Analysts in the line of business deal with a myriad of time-consuming data preparation and analytic challenges that often require IT or DBA intervention to deliver a requested dataset. Others have taught themselves “enough SQL to be dangerous”, learning the necessary code to extract the data needed to answer their business question. Self-service data analytics empowers these business analysts to take control of the entire analytics process, delivering the necessary results for better business decisions.
Join us to learn how self-service data analytics allows analysts to:
- Utilize a drag-and-drop workflow for data and analytic processes without writing code
- Minimize data movement and ensure data integrity through in-database capabilities
- Easily work across relational and non-relational databases to deliver faster business results
Self-service data analytics delivers a repeatable process that is transparent to not only business analysts, but also SQL coders and decision makers across the organization.
Smarter businesses apply AI to learn and continuously evolve the way they work. To extract full value from AI, companies need data strategy that gives them access to all their data – no matter where it lives – in an environment that easily scales and applies the latest discovery technology including advanced analytics, visualization and AI. Learn how IBM Watson and Data provides all the tools companies need to embed AI, machine learning and deep learning in their business, while enabling professionals to gain the most from their data to drive smarter business and lead industry-changing transformations.
Building a New Platform for Customer Analytics Caserta
Caserta Concepts and Databricks partner up to bring you this insightful webinar on how a business can choose from all of the emerging big data technologies to figure out which one best fits their needs.
Data Catalog as the Platform for Data IntelligenceAlation
Data catalogs are in wide use today across hundreds of enterprises as a means to help data scientists and business analysts find and collaboratively analyze data. Over the past several years, customers have increasingly used data catalogs in applications beyond their search & discovery roots, addressing new use cases such as data governance, cloud data migration, and digital transformation. In this session, the founder and CEO of Alation will discuss the evolution of the data catalog, the many ways in which data catalogs are being used today, the importance of machine learning in data catalogs, and discuss the future of the data catalog as a platform for a broad range of data intelligence solutions.
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
Marketing is all about attracting, retaining and building profitable relationships with your customers, but how do you know which customers to target, which campaigns to run, and which marketing programs to invest in, to get most return for your dollar?
Join Alteryx and Keyrus as we demonstrate how to combine all relevant marketing, sales and customer data, and perform sophisticated analytics to deepen customer insight and calculate ROI of marketing programs.
You’ll walk away knowing how to:
Segment and profile your customers – take that raw data and translate it into real value
Build a marketing attribution model within Alteryx, creating a personal answer engine for your company.
Leverage R or Python code in an Alteryx workflow so data scientists can collaborate with non-coding stake holders in a code-friendly and code-free environment.
Join Alteryx and Keyrus and get the actionable insights you need to drive marketing ROI analytics, and answer million-dollar questions without spending millions of dollars on standardized solutions.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Data lineage to drive compliance and as a business imperativeLeigh Hill
The importance of data lineage has escalated in recent years in response to regulatory demand and increased business understanding of the benefits it can deliver. Like all capital markets technology, data lineage presents both challenges and opportunities, so how best can it be implemented and sustained? And how can your organisation reap the rewards of successful implementation?
This webinar will outline data lineage, its progress towards automation, and why it is so important from both a regulatory and business perspective. It will also provide advice on how to select a solution and step-by-step guidance on how to implement and integrate data lineage. Finally, the webinar will discuss how to manage data lineage to ensure regulatory compliance, deliver business benefits and plan for the future.
Register for the webinar to find out more about:
-The importance of data lineage in capital markets
-How to select a solution for your organisation
-Approaches to implementation and integration
-How to achieve sustainable regulatory compliance
-The business benefits of successful implementation
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
According to Forrester Research, only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are high-cost IT projects, local applications that are not built to scale for production workflows, or laptop decision support projects that never impact customers. Despite this high failure rate, we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models, but not many people talk about getting them into production where they can impact customers.
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean, and DevOps. However, DataOps is not just about shipping working machine learning models; it starts with better alignment of data science with the rest of the organization and its goals. Harvinder shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction, self-service tools giving data scientists freedom from bottlenecks, and more. The DataOps methodology will enable you to eliminate daily barriers, putting your data scientists in control of delivering ever-faster cutting-edge innovation for your organization and customers.
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...DATAVERSITY
J.B. Hunt, one of the leading providers of transportation and logistics services in North America, recognizes the criticality of customer responsiveness, service quality, and operational efficiency for its success. However, with its data spread across multiple sources, including legacy mainframe systems, the organization was struggling to meet data requirements from multiple departments. They struggled to troubleshoot operational issues and respond to customers quickly.
Join this webinar to hear about the optimized solution J. B. Hunt implemented, which automates real-time data pipelines for a reliable cloud data lake and provides multiple user groups an in-the-moment view of data without overwhelming internal operational systems. Discover how J.B. Hunt now leverages a modernized data environment to accelerate data delivery and drive various AI and analytics initiatives such as real-time service-pricing, competitive counterbidding, and improving their customer experience.
Learn how you can:
• Ingest data in real-time from legacy mainframe systems, enterprise applications, and more
• Create a reliable cloud data lake to accelerate AI and Analytic Initiatives
• Catalog, prepare, and provision data to empower data consumers
• Drive operational efficiency and customer experience with AI-augmented insights
Chief Data & Analytics Officer Fall Boston - PresentationSrinivasan Sankar
Data Asset Catalog & Metadata Management - Is It a Fad or Is It the Future?
Many have dubbed metadata as “the new black,” but is this accurate?
How to leverage metadata management to streamline data governance and ensure transparency
Improving data quality and ensuring consistency and accuracy of data across various reporting systems
Looking at the flip side: what are the additional training requirements and value-added for the business?
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
Modern data analysis is moving beyond the Data Warehouse to the Data Lake where analysts are able to take advantage of emerging technologies to manage complex analytics on large data volumes and diverse data types. Yet, for some business problems, a Data Warehouse may still be the right solution.
If you’re on the fence, join this webinar as we compare and contrast Data Lakes and Data Warehouses, identifying situations where one approach may be better than the other and highlighting how the two can work together.
Get tips, takeaways and best practices about:
- The benefits and problems of a Data Warehouse
- How a Data Lake can solve the problems of a Data Warehouse
- Data Lake Architecture
- How Data Warehouses and Data Lakes can work together
Managing uncertainty in data - Presentation at Data Science Northeast Netherl...University of Twente
Managing uncertainty in data: the key to effective management of data quality problems
Business analytics and data science are significantly impaired by a wide variety of 'data handling' issues, especially when data from different sources are combined and when unstructured data is involved. The root cause of many such problems centers around data semantics and data quality. We have developed a generic method which is based on modeling such problems as uncertainty *in* the data. A recently conceived new kind of DBMS can store, manage, and query large volumes of uncertain data: the UDBMS or "Uncertain Database". Together, they allow one to, e.g., postpone the resolution of data problems, assess what their influence is on analytical results, etc. We furthermore develop technology for data cleansing, web harvesting, and natural language processing which uses this method to deal with ambiguity of natural language and many other problems encountered when using unstructured data.
Big data. Small data. All data. You have access to an ever-expanding volume of data inside the walls of your business and out across the web. The potential in data is endless – from predicting election results to preventing the spread of epidemics. But how can you use it to your advantage to help move your business forward?
Data is growing exponentially and it’s now possible to mine and unlock insights from data in new and unexpected ways. Empower your business to take advantage of this data by harnessing the rich capabilities of Microsoft SQL Server and the familiarity of Microsoft Office to help organize, analyze, and make sense of your data—no matter the size.
This talk was held at the 13th meeting on Sept 23rd 2014 by Bruno Ungermann.
Conceptual overview of Hadoop based analytics, comparison between data warehouse architecture and Big Data architecture, characteristics of „schema on read“, typical Big Data use cases like customer analytics, operational analytics and EDW optimization, short software demo
The “death” of the data warehouse has been overhyped for
some time now, but it’s no secret that growth in this segment
of the market has been slowing. But we now see a major shift
in the application of this technology to the cloud where
Amazon led the way with an on-demand cloud data warehouse
in Redshift. Redshift was AWS’s fastest growing service
but it now has competition from Google with BigQuery,
Why performance testing? 2012: Research showed that Amazon would lose $1.6 billion in sales every year if its site took one more second to load. 2013: 39% of e-retailers claimed they lost money last year due to performance or stability problems. 2014: The web performance monitoring company Catchpoint Systems looked at aggregate performance on Black Friday and compared it to the same timeframe in 2013.The results are notable: desktop web pages were 19.85 percent slower, while mobile web pages were a whopping 57.21 percent slower. 2015: Some major e-retailers’ sites buckled under the pressure of heavy holiday traffic during 2015’s Cyber Monday peak traffic times.
Exploring Data Preparation and Visualization Tools for Urban ForestryAzavea
This webinar was held on December 12, 2012 and provided an overview of free and low-cost tools for cleaning and preparing data and building useful and beautiful data visualizations.
Driving Retail Success with Machine Data IntelligenceSumo Logic
Gain a competitive edge this holiday season by harnessing the power of machine data. Watch the on-demand webinar to learn how the out-of-the-box integration between Sumo Logic and Akamai allows organizations to:
• Gain a competitive edge by identifying purchasing trends in real-time
• Improve service by correlating Akamai data sets for reduced errors and downtime
• Strengthen security posture through compliance and web application firewall (WAF) monitoring
• Elastically scale to meet unforeseen or projected spikes in business
• Streamline order management, store performance and loss prevention
See the integration in action.
How Can You Calculate the Cost of Your Data?DATAVERSITY
Today, self-service, Cloud and big data technologies make new data preparation capabilities necessary…and possible. But, we've all been through the hype cycle and know the trough of disillusionment can come on hard and fast.
Organizations have been trying to solve the data quality problem and democratize insights for years spending millions of dollars and dedicating an increasing amount of resources to manage and govern the data. The result? Everyone is still looking to solve the problem.
Data preparation offers a new paradigm, but how can you avoid another round of minimal business impact? We’ll review a true data ROI model that helps organizations understand the value of existing versus modern data management architectures.
Database Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTO✔ Eric David Benari, PMP
Database Driven Location Intelligence: The Missing Dimension
Javier de la Torre, Founder & CEO, CARTO
Video of this session at the Database Camp conference at the UN is on http://www.Database.Camp
Эталонная архитектура сервиса из компонентов со 100% открытым исходным кодом, готового к развертыванию в облаках, с масштабируемостью и надежностью уровня предприятия.
Антон Овчинников, Grid Dynamics
We’re excited to announce that we are evolving our cloud application architecture to be more flexible and modular, giving you greater control of your environment and more choices for components, deployment options and infrastructure.
During this webcast we'll provide more information on Engine Yard Cloud's new cluster model, infrastructure abstraction layer and monitoring and alerting agent, share what's coming and have an open Q&A to answer your questions.
This presentation was prepared for a Webcast where John Yerhot, Engine Yard US Support Lead, and Chris Kelly, Technical Evangelist at New Relic discussed how you can scale and improve the performance of your Ruby web apps. They shared detailed guidance on issues like:
Caching strategies
Slow database queries
Background processing
Profiling Ruby applications
Picking the right Ruby web server
Sharding data
Attendees will learn how to:
Gain visibility on site performance
Improve scalability and uptime
Find and fix key bottlenecks
See the on-demand replay:
http://pages.engineyard.com/6TipsforImprovingRubyApplicationPerformance.html
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
Data Lake or Data Swamp? By now, we’ve likely all heard the comparison. Data Lake architectures have the opportunity to provide the ability to integrate vast amounts of disparate data across the organization for strategic business analytic value. But without a proper architecture and metadata management strategy in place, a Data Lake can quickly devolve into a swamp of information that is difficult to understand. This webinar will offer practical strategies to architect and manage your Data Lake in a way that optimizes its success.
The first step towards understanding data assets’ impact on your organization is understanding what those assets mean for each other. Metadata – literally, data about data – is a practice area required by good systems development, and yet is also perhaps the most mislabeled and misunderstood Data Management practice. Understanding metadata and its associated technologies as more than just straightforward technological tools can provide powerful insight into the efficiency of organizational practices and enable you to combine practices into sophisticated techniques supporting larger and more complex business initiatives. Program learning objectives include:
- Understanding how to leverage metadata practices in support of business strategy
- Discuss foundational metadata concepts
- Guiding principles for and lessons previously learned from metadata and its practical uses applied strategy
Metadata strategies include:
- Metadata is a gerund so don’t try to treat it as a noun
- Metadata is the language of Data Governance
- Treat glossaries/repositories as capabilities, not technology
Decision Ready Data: Power Your Analytics with Great DataDLT Solutions
Murthy Mathiprakasam, Principal Product Marketing Manager at Informatica, shares how to power your analytics with great data from the 2015 Informatica Government Summit.
Too often I hear the question “Can you help me with our data strategy?” Unfortunately, for most, this is the wrong request because it focuses on the least valuable component: the data strategy itself. A more useful request is: “Can you help me apply data strategically?” Yes, at early maturity phases the process of developing strategic thinking about data is more important than the actual product! Trying to write a good (must less perfect) data strategy on the first attempt is generally not productive –particularly given the widespread acceptance of Mike Tyson’s truism: “Everybody has a plan until they get punched in the face.” This program refocuses efforts on learning how to iteratively improve the way data is strategically applied. This will permit data-based strategy components to keep up with agile, evolving organizational strategies. It also contributes to three primary organizational data goals. Learn how to improve the following:
- Your organization’s data
- The way your people use data
- The way your people use data to achieve your organizational strategy
This will help in ways never imagined. Data are your sole non-depletable, non-degradable, durable strategic assets, and they are pervasively shared across every organizational area. Addressing existing challenges programmatically includes overcoming necessary but insufficient prerequisites and developing a disciplined, repeatable means of improving business objectives. This process (based on the theory of constraints) is where the strategic data work really occurs as organizations identify prioritized areas where better assets, literacy, and support (data strategy components) can help an organization better achieve specific strategic objectives. Then the process becomes lather, rinse, and repeat. Several complementary concepts are also covered, including:
- A cohesive argument for why data strategy is necessary for effective data governance
- An overview of prerequisites for effective strategic use of data strategy, as well as common pitfalls
- A repeatable process for identifying and removing data constraints
- The importance of balancing business operation and innovation
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
Watch full webinar here: https://bit.ly/3offv7G
Presented at AI Live APAC
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Watch this on-demand session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc.
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...Big Data Week
We all are aware of the challenges enterprises are having with growing data and silo’d data stores. Business is not able to make reliable decisions with un-trusted data and on top of that, they don’t have access to all data within and outside their enterprise to stay ahead of the competition and make key decisions in their business
This session will take a deep dive into current challenges business are having today and how to build a Modern Data Architecture using emerging technologies such as Hadoop, Spark, NoSQL data stores, MPP Data stores and scalable and cost effective cloud solutions such as AWS, Azure and Bigstep.
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
Watch full webinar here: https://bit.ly/35FUn32
Presented at CDAO New Zealand
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists.
However, most architecture laid out to enable data scientists miss two key challenges:
- Data scientists spend most of their time looking for the right data and massaging it into a usable format
- Results and algorithms created by data scientists often stay out of the reach of regular data analysts and business users
Watch this session on-demand to understand how data virtualization offers an alternative to address these issues and can accelerate data acquisition and massaging. And a customer story on the use of Machine Learning with data virtualization.
These slides—based on the webinar featuring John L Myers, managing research director for data and analytics at leading IT analyst firm Enterprise Management Associates (EMA), and Neil Barton, chief technology officer at WhereScape—highlight how the world of streaming data pipelines and automation practices for analytical environments intersect to provide value to both business stakeholders and corporate technologists.
View these slides to learn about:
- Drivers behind the growth of streaming usage scenarios
- Challenges that streaming data presents
- Value of automation techniques and technologies
- Benefits of applying automation to streaming data pipelines
- How WhereScape® automation with Streaming can fast-track streaming data use in your data landscape
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
Watch full webinar here: https://bit.ly/3Ab9gYq
Imagina llegar a un parque de atracciones con tu familia y comenzar tu día sin el típico plano que te permitirá planificarte para saber qué espectáculos ver, a qué atracciones ir, donde pueden o no pueden montar los niños… Posiblemente, no podrás sacar el máximo partido a tu día y te habrás perdido muchas cosas. Hay personas que les gusta ir a la aventura e ir descubriendo poco a poco, pero cuando hablamos de negocios, ir a la aventura puede ser fatídico...
En la era de la explosión de la información repartida en distintas fuentes, el gobierno de datos es clave para garantizar la disponibilidad, usabilidad, integridad y seguridad de esa información. Asimismo, el conjunto de procesos, roles y políticas que define permite que las organizaciones alcancen sus objetivos asegurando el uso eficiente de sus datos.
La virtualización de datos, herramienta estratégica para implementar y optimizar el gobierno del dato, permite a las empresas crear una visión 360º de sus datos y establecer controles de seguridad y políticas de acceso sobre toda la infraestructura, independientemente del formato o de su ubicación. De ese modo, reúne múltiples fuentes de datos, las hace accesibles desde una sola capa y proporciona capacidades de trazabilidad para supervisar los cambios en los datos.
En este webinar aprenderás a:
- Acelerar la integración de datos provenientes de fuentes de datos fragmentados en los sistemas internos y externos y obtener una vista integral de la información.
- Activar en toda la empresa una sola capa de acceso a los datos con medidas de protección.
- Cómo la virtualización de datos proporciona los pilares para cumplir con las normativas actuales de protección de datos mediante auditoría, catálogo y seguridad de datos.
Enabling data scientists within an enterprise requires a well-thought out approach from an organization, technology, and business results perspective. In this talk, Tim and Hussain will share common pitfalls to data science enablement in the enterprise and provide their recommendations to avoid them. Taking an example, actionable use case from the financial services industry, they will focus on how Anaconda plays a pivotal role in setting up big data infrastructure, integrating data science experimentation and production environments, and deploying insights to production. Along the way, they will highlight opportunities for leveraging open source and unleashing data science teams while meeting regulatory and compliance challenges.
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
Watch full webinar here: https://bit.ly/3lSwLyU
En la era de la explosión de la información repartida en distintas fuentes, el gobierno de datos es un componente clave para garantizar la disponibilidad, usabilidad, integridad y seguridad de la información. Asimismo, el conjunto de procesos, roles y políticas que define permite que las organizaciones alcancen sus objetivos asegurando el uso eficiente de sus datos.
La virtualización de datos forma parte de las herramientas estratégica para implementar y optimizar el gobierno de datos. Esta tecnología permite a las empresas crear una visión 360º de sus datos y establecer controles de seguridad y políticas de acceso sobre toda la infraestructura, independientemente del formato o de su ubicación. De ese modo, reúne múltiples fuentes de datos, las hace accesibles desde una sola capa y proporciona capacidades de trazabilidad para supervisar los cambios en los datos.
Le invitamos a participar en este webinar para aprender:
- Cómo acelerar la integración de datos provenientes de fuentes de datos fragmentados en los sistemas internos y externos y obtener una vista integral de la información.
- Cómo activar en toda la empresa una sola capa de acceso a los datos con medidas de protección.
- Cómo la virtualización de datos proporciona los pilares para cumplir con las normativas actuales de protección de datos mediante auditoría, catálogo y seguridad de datos.
Data-Centric Analytics and Understanding the Full Data Supply ChainDATAVERSITY
While model development is an important part of analytics, this activity can be compromised by a lack of understanding of the data used in these models and poor Data Quality. For insights to be relied upon and truly actionable, data-related issues must be addressed.
The data supply chain (the set of architectural components that moves data around the enterprise from points where it is created or acquired to points where it is used) must be managed to supply the needs of analytics and other constituencies.
This webinar describes how the data supply chain should be designed and operated to provide analytics with the data it needs, and how Data Scientists should interact with the data supply chain to obtain the data they need. It also covers:
Data-centric considerations that must be taken into account in the development of analytic models
Features of a modern data supply chain
Major components in the data supply chain, with a focus on Data Lakes
Major roles and responsibilities in the data supply chain
How analytics must interact with the data supply chain
Similar to Reinventing the Modern Information Pipeline: Paxata and MapR (20)
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
3. Paxata’s mission (since 2012)
Deliver the only enterprise-grade data preparation platform
for everyone to transform raw, meaningless data into
valuable, contextual and complete information
4. 4
Source: Gartner News Room: http://www.gartner.com/newsroom/id/2975018
83%Companies agree that data is
their most strategic asset
5. 5
Source: Gartner News Room: http://www.gartner.com/newsroom/id/2975018
80%Time analysts will spend trying to
create data sets to draw insights
83%Companies agree that data is
their most strategic asset
6. 6
Source: Gartner News Room: http://www.gartner.com/newsroom/id/2975018
12%Amount of data most companies
estimate they are analyzing
80%Time analysts will spend trying to
create data sets to draw insights
83%Companies agree that data is
their most strategic asset
7. 7
The data chasm
Source: Gartner News Room: http://www.gartner.com/newsroom/id/2975018
12%Amount of data most companies
estimate they are analyzing
80%Time analysts will spend trying to
create data sets to draw insights
83%Companies agree that data is
their most strategic asset
10. Traditional data preparation creates a bottleneck
Business teams have complex data sources for analytics projects
11. Traditional data preparation creates a bottleneck
Business teams funnel their requirements to IT
IT-centric data preparation
Business
Information
12. Traditional data preparation creates a bottleneck
IT runs requirements through a linear ETL process
executed with manual scripting or coding
IT-Centric Data Preparation
Model Extract Transform Load Optimize
Business
Information
13. Traditional data preparation creates a bottleneck
IT reviews with business. Makes changes, fixes errors.
(Repeat)
IT-Centric Data Preparation
Model Extract Transform Load Optimize
Business
Information
14. Business teams make decisions before data is available
-or-
Ask for changes and restart the process.
IT-Centric Data Preparation
Model Extract Transform Load Optimize
Business
Information
Traditional data preparation creates a bottleneck
15. Designed for highly specialized technical people to prepare data for
business teams
IT-Centric Data Preparation
Model Extract Transform Load Optimize
Business
Information
Traditional data preparation creates a bottleneck
16. Designing for highly specialized technical
people to prepare data for business teams.
Expensive
Complicated
Error-prone
Time-consuming
18. Modern architecture: balancing freedom with responsibility
Built for business
•Freedom and
flexibility with
collaboration
19. Modern architecture: balancing freedom with responsibility
Collect and manage data
Time
Built for business
•Freedom and
flexibility with
collaboration
Enabled by IT
•Data governance,
scale, efficiency
20. Modern information pipeline is
Built for business
Freedom and flexibility with collaboration
Enabled by IT
Data governance, scale, efficiency
22. Data prep must address the range of information workers
Source: Forrester Research, Inc., “Info Workers Will Erase The Boundary Between
Enterprise and Consumer Technologies,” August 30, 2012
Deep Technical Skills Limited Technical Skills
Data Scientist
Data Developer
Data Analyst
Business Analyst
Information
Worker
23. Data prep must address the range of information workers
Source: Forrester Research, Inc., “Info Workers Will Erase The Boundary Between
Enterprise and Consumer Technologies,” August 30, 2012
Deep Technical Skills Limited Technical Skills
Data Scientist
(200K)
Data Developer
(600K)
Data Analyst
(100M)
Business Analyst
(275M)
Information
Worker
(460M)
Deliver the only enterprise-grade data preparation platform that lets everyone transform raw, meaningless data into valuable, contextual and complete information
To seize the opportunity you must cross this data chasm.
Why…Because its hard
Traditional, legacy technologies and processes that companies currently leverage were NOT designed for the variety and volume of data that companies are working with today.
Companies need to be more nimble
We have many customers that have 10’s of Millions invested annual in traditional ETL processes, and they were still spending too much time preparing data and not on the value added tasks of analytics.
They selected Paxata to help complement these technologies and fill the gaps with a more exploratory, interactive experience.
To seize the opportunity you must cross this data chasm.
Why…Because its hard
Traditional, legacy technologies and processes that companies currently leverage were NOT designed for the variety and volume of data that companies are working with today.
Companies need to be more nimble
We have many customers that have 10’s of Millions invested annual in traditional ETL processes, and they were still spending too much time preparing data and not on the value added tasks of analytics.
They selected Paxata to help complement these technologies and fill the gaps with a more exploratory, interactive experience.
To seize the opportunity you must cross this data chasm.
Why…Because its hard
Traditional, legacy technologies and processes that companies currently leverage were NOT designed for the variety and volume of data that companies are working with today.
Companies need to be more nimble
We have many customers that have 10’s of Millions invested annual in traditional ETL processes, and they were still spending too much time preparing data and not on the value added tasks of analytics.
They selected Paxata to help complement these technologies and fill the gaps with a more exploratory, interactive experience.
To seize the opportunity you must cross this data chasm.
Why…Because its hard
Traditional, legacy technologies and processes that companies currently leverage were NOT designed for the variety and volume of data that companies are working with today.
Companies need to be more nimble
We have many customers that have 10’s of Millions invested annual in traditional ETL processes, and they were still spending too much time preparing data and not on the value added tasks of analytics.
They selected Paxata to help complement these technologies and fill the gaps with a more exploratory, interactive experience.
Deliver the only enterprise-grade data preparation platform that lets everyone transform raw, meaningless data into valuable, contextual and complete information
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Slide use: problem of data (option 4)
This is a five-part slide. Use this along with the 4 slides before it.
Talking Points: Big Data and self-service analytics necessitate a fundamental transformation from an IT-centric data preparation process to a self-service data preparation model. In the self-service model, the steps that make of data preparation – data integration, quality, cleansing, enrichment and shaping don’t go away, they need to be re-imagined in a way that enables the business or data analyst to accomplish these tasks on their own which in turn empowers them to work with vertical slices of relevant data and get the results they want, when they need them. However, it’s important that the self-service model also provide the governance and traceability that IT requires to maintain trust in data and analytic results. In this new model, IT’s role changes to collection and centralization of access to raw data and to providing the right infrastructure to the business that drive self-service data preparation and analytics, while maintaining full governance.
Slide use: problem of data (option 4)
This is a five-part slide. Use this along with the 4 slides before it.
Talking Points: Big Data and self-service analytics necessitate a fundamental transformation from an IT-centric data preparation process to a self-service data preparation model. In the self-service model, the steps that make of data preparation – data integration, quality, cleansing, enrichment and shaping don’t go away, they need to be re-imagined in a way that enables the business or data analyst to accomplish these tasks on their own which in turn empowers them to work with vertical slices of relevant data and get the results they want, when they need them. However, it’s important that the self-service model also provide the governance and traceability that IT requires to maintain trust in data and analytic results. In this new model, IT’s role changes to collection and centralization of access to raw data and to providing the right infrastructure to the business that drive self-service data preparation and analytics, while maintaining full governance.
Slide use: Who are the data analysts
Talking points: This pyramid describes the typical information work roles in today’s enterprises and highlights the dramatic scale that self-service data preparation can bring. Legacy and many Big Data tools target the Data Scientist and the Data Developer, but as you can see there are hugely more data analysts our there, and self-service data prep empowers them to drive their own data destiny, breaking the logjam of traditional IT-constrained ETL and data preparation. By Data Analysts, we are referring to Power Excel users or Tableau users who understand data and analytics, but don’t write code or scripts. For self-service data prep to truly transform an organization, it must empower the data analyst; however, self-service data prep simplifies many traditionally complex and time-consuming preparation operations and the work of data scientists and data developers can be dramatically accelerated by self-service data prep.
Source: Prakash VC deck
Slide use: Who are the data analysts
Talking points: This pyramid describes the typical information work roles in today’s enterprises and highlights the dramatic scale that self-service data preparation can bring. Legacy and many Big Data tools target the Data Scientist and the Data Developer, but as you can see there are hugely more data analysts our there, and self-service data prep empowers them to drive their own data destiny, breaking the logjam of traditional IT-constrained ETL and data preparation. By Data Analysts, we are referring to Power Excel users or Tableau users who understand data and analytics, but don’t write code or scripts. For self-service data prep to truly transform an organization, it must empower the data analyst; however, self-service data prep simplifies many traditionally complex and time-consuming preparation operations and the work of data scientists and data developers can be dramatically accelerated by self-service data prep.
Deliver the only enterprise-grade data preparation platform that lets everyone transform raw, meaningless data into valuable, contextual and complete information