This document provides an overview of big data and Hadoop. It discusses the key concepts of big data including volume, velocity and variety. It then explains what Hadoop is, how it works using HDFS for storage and MapReduce for processing. The document outlines some of Hadoop's strengths like scalability and affordability, as well as limitations like requiring advanced technical skills. It then introduces Platfora and how it builds on Hadoop to make big data accessible and useful for business users without separate tools. The presentation ends with encouraging participation in women in STEM programs for a chance to win Hadoop training.
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...BigDataEverywhere
Hadoop use cases have historically trended towards cost reduction through data warehouse offload. More recently, an uptick around customer-centric use cases have proven the ability for Hadoop to drive top-line revenue. In this session, Platfora solution architect Rob Rosen will discuss how the ability to coreelate multi-structured data in Hadoop leads to greater customer adoption, expanded cross-selling and reduced customer churn for enterprises deploying Hadoop-centric data lakes.
Analyzing Unstructured Data in Hadoop WebinarDatameer
Unstructured data is growing 62% per year faster than structured data. According to Gartner, data volumes are set to grow 800% in aggregate over the next 5 years, and 80% of it will be unstructured data.
This on-demand webinar will highlight and discuss:
How applying big data analytics to unstructured data can help you gain richer, deeper and more accurate insights to gain competitive advantages
The sources of unstructured data which include email, social media platforms, CRM systems, call center platforms (including notes and speech-to-text transcripts), and web scrapes
How monitoring the communications of your customers and prospects enables you to make time-sensitive decisions and jump on new business opportunities
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongDATAVERSITY
Is your organization using agile approaches to systems development project? Have you found that there are conflicting opinions with what should be done, when it should be done and who should do it? Is there even a suggestion that data modeling isn’t needed on an Agile project? Are your data architects stuck in a waterfall world? Are you asking for “no more changes” to the data model? Do your developers thing that “just the right documentation” means no modeling allowed? Does anyone even know where the reference data for the application is located? Or how it is updated?
In this month’s webinar, Karen will show you how data modeling and Agile approaches CAN work together to deliver quality information systems and solutions, with fewer dysfunctions and less tears.
Business in the Driver’s Seat – An Improved Model for IntegrationInside Analysis
The Briefing Room with Dr. Robin Bloor and WhereScape
Live Webcast on September 30, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=bfff40f7c9645fc398770ea11152b148
The fueling of information systems will always require some effort, but a confluence of innovations is fundamentally changing how quickly and accurately it can be done. Gone are long cycle times for development. Today, organizations can embrace a more rapid and collaborative approach for building analytical applications and data warehouses. The key is to have business experts working hand-in-hand with data professionals as the solutions take shape, thus expediting the speed to valuable insights.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains the changing nature of information design. He’ll be briefed by WhereScape President Mark Budzinski, who will discuss his company’s data warehouse automation solutions and how they enable collaborative development. He will share use cases that illustrate show aligning business and IT, organizations can enable faster and more agile data warehouse development.
Visit InsideAnlaysis.com for more information.
Data Exploration and Analytics for the Modern BusinessDATAVERSITY
Every day, your business generates enormous quantities of data. How can you unlock its value? How can you build self-service exploration experiences that empower frontline decision-makers?
This webinar features Greg Jones from Smartling and Scott Hoover from Looker. Smartling is a powerful software platform for managing translation and localization of digital content. Looker is a data exploration platform that operates in the database. Together, Greg and Scott will introduce you to a modern approach to managing analytics in today’s fast-growing, web-centric business environments.
What are actionable insights? (Introduction to Operational Analytics Software)Newton Day Uploads
What Are Actionable Insights? In this presentation I outline what Actionable Insights are and the Operational Analytics Software that can produce them. And because Business Intelligence and the Business Intelligence Software market can be so confusing for buyers I've attempted to position where Actionable Insights and Operational Analytics fit in the Business Intelligence 'story'.
Modernizing Architecture for a Complete Data StrategyCloudera, Inc.
Data is the future of business. Either take advantage of it, or get surpassed by those who do.
In this webinar, Ovum's Tony Baer discusses the importance of building a modern data strategy that ensures your journey with Apache Hadoop and big data is a successful one. Together, we'll walk through how to build a plan for long-term success while realizing short-term gains, including:
How to pinpoint the business goals that matter most
How to assess your strengths and weaknesses to meet those goals
How to build a thoughtful approach that ensures your initiatives succeed
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...BigDataEverywhere
Hadoop use cases have historically trended towards cost reduction through data warehouse offload. More recently, an uptick around customer-centric use cases have proven the ability for Hadoop to drive top-line revenue. In this session, Platfora solution architect Rob Rosen will discuss how the ability to coreelate multi-structured data in Hadoop leads to greater customer adoption, expanded cross-selling and reduced customer churn for enterprises deploying Hadoop-centric data lakes.
Analyzing Unstructured Data in Hadoop WebinarDatameer
Unstructured data is growing 62% per year faster than structured data. According to Gartner, data volumes are set to grow 800% in aggregate over the next 5 years, and 80% of it will be unstructured data.
This on-demand webinar will highlight and discuss:
How applying big data analytics to unstructured data can help you gain richer, deeper and more accurate insights to gain competitive advantages
The sources of unstructured data which include email, social media platforms, CRM systems, call center platforms (including notes and speech-to-text transcripts), and web scrapes
How monitoring the communications of your customers and prospects enables you to make time-sensitive decisions and jump on new business opportunities
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongDATAVERSITY
Is your organization using agile approaches to systems development project? Have you found that there are conflicting opinions with what should be done, when it should be done and who should do it? Is there even a suggestion that data modeling isn’t needed on an Agile project? Are your data architects stuck in a waterfall world? Are you asking for “no more changes” to the data model? Do your developers thing that “just the right documentation” means no modeling allowed? Does anyone even know where the reference data for the application is located? Or how it is updated?
In this month’s webinar, Karen will show you how data modeling and Agile approaches CAN work together to deliver quality information systems and solutions, with fewer dysfunctions and less tears.
Business in the Driver’s Seat – An Improved Model for IntegrationInside Analysis
The Briefing Room with Dr. Robin Bloor and WhereScape
Live Webcast on September 30, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=bfff40f7c9645fc398770ea11152b148
The fueling of information systems will always require some effort, but a confluence of innovations is fundamentally changing how quickly and accurately it can be done. Gone are long cycle times for development. Today, organizations can embrace a more rapid and collaborative approach for building analytical applications and data warehouses. The key is to have business experts working hand-in-hand with data professionals as the solutions take shape, thus expediting the speed to valuable insights.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains the changing nature of information design. He’ll be briefed by WhereScape President Mark Budzinski, who will discuss his company’s data warehouse automation solutions and how they enable collaborative development. He will share use cases that illustrate show aligning business and IT, organizations can enable faster and more agile data warehouse development.
Visit InsideAnlaysis.com for more information.
Data Exploration and Analytics for the Modern BusinessDATAVERSITY
Every day, your business generates enormous quantities of data. How can you unlock its value? How can you build self-service exploration experiences that empower frontline decision-makers?
This webinar features Greg Jones from Smartling and Scott Hoover from Looker. Smartling is a powerful software platform for managing translation and localization of digital content. Looker is a data exploration platform that operates in the database. Together, Greg and Scott will introduce you to a modern approach to managing analytics in today’s fast-growing, web-centric business environments.
What are actionable insights? (Introduction to Operational Analytics Software)Newton Day Uploads
What Are Actionable Insights? In this presentation I outline what Actionable Insights are and the Operational Analytics Software that can produce them. And because Business Intelligence and the Business Intelligence Software market can be so confusing for buyers I've attempted to position where Actionable Insights and Operational Analytics fit in the Business Intelligence 'story'.
Modernizing Architecture for a Complete Data StrategyCloudera, Inc.
Data is the future of business. Either take advantage of it, or get surpassed by those who do.
In this webinar, Ovum's Tony Baer discusses the importance of building a modern data strategy that ensures your journey with Apache Hadoop and big data is a successful one. Together, we'll walk through how to build a plan for long-term success while realizing short-term gains, including:
How to pinpoint the business goals that matter most
How to assess your strengths and weaknesses to meet those goals
How to build a thoughtful approach that ensures your initiatives succeed
Objectivity/DB: A Multipurpose NoSQL DatabaseInfiniteGraph
The speakers will describe the flexible configuration possibilities that Objectivity/DB provides, with an emphasis on how best to distribute data across multiple storage nodes. The session will start by describing the distributed processing architecture of Objectivity/DB before covering the new Placement Manager features. The speakers will also describe how Objectivity/DB compares and contrasts with other NoSQL solutions.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.
Getting Started with Big Data for Business ManagersDatameer
Big Data has become critical to the enterprise because of the massive amount of untapped data sources, and the potential to gain new insights that were previously not possible. So, how to get started with Big Data and Hadoop becomes a question more pertinent than ever before.
Listen to leading analyst at Ovum, Tony Baer, as he discusses answers to the key questions around how to:
Approach Big Data and associated business challenges
-- Identify what types of new insights can be revealed by Big Data
-- Staff for this undertaking and implement the technology necessary to be successful
-- Take the first steps toward getting started with Big Data on Hadoop
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Both digital and traditional businesses are constantly evolving, and the need to move fast is a pervasive reality. Delivering what customers want and need goes beyond the creation of delivery channels. In fact, it relies on the company’s ability to produce, consume, organise, understand, curate, and distribute data.
In this presentation, Dan Aragao and Simon Hope provide a glimpse of the journey ThoughtWorks and REA are currently undergoing to create a truly data-centric, cutting-edge digital business.
What is the value of big data? How does a user get that value?
Before, analysts would have to wait months relying on IT for a new report or make changes to an existing one. Now, analysts are able to shrink that time down to days or even minutes. On top of that, analysts can ask questions that were not possible before. In this webinar, we’ll show you how this analysis is possible and the value that has been achieved by customers.
In this session, you will learn:
How analysts get value out of big data
How to visualize data at every step of analysis
How analysts can do big data analytics without IT, in one product
Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
Companies today are all focused on finding new consumption models to better utilize the data they produce. This presentation will provide insights and best practices for creating the organization and sponsorship necessary to set the foundation for success.
For this session, Dan will provide an overview of the process and methodologies he employs to establish and sustain a Data Driven Culture. Key topics will include:
Data Driven Culture
Executive Sponsorship
Organizational Structure – Collaboration Hubs and Bi-Modal Analytics
Role of Hadoop and Big Data as Part of Data Driven Culture
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Understanding What’s Possible: Getting Business Value from Big Data QuicklyInside Analysis
The Briefing Room with David Loshin and OpenText
Live Webcast April 14, 2015
https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e079dc562543a394c5c5d0588e7cd9152
To be successful and practical in delivering meaningful insights, companies must embrace the three pillars of enterprise analytics: scalability, open standards, and speed to value. In doing so, organizations enable a range of options that can satisfy both data scientists and self-service business users alike. But getting there requires a thoughtful approach -- and some enterprise knowledge of statistical modeling. How can your company stay ahead of the game?
Register for this episode of The Briefing Room to learn from veteran Analyst David Loshin, as he explains why the fundamentals will always apply to the high-stakes game of analytics. He’ll be briefed by Allen Bonde of Actuate, now part of OpenText, who will showcase his company’s intelligence platform, which was designed from the ground up to embrace open standards and was purpose-built to serve large enterprises with a wide range of data needs. He'll demonstrate recent success stories using a number of Big Data sources, including device and machine data.
Visit InsideAnalysis.com for more information.
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Superfast Business - offers fully funded support to help ambitious businesses in the South West with a focus on rural areas identify, maximise and profit from the opportunities that superfast broadband and new technologies present. They have a team of expert advisers, a programme of events on hot topics offering inspirational insights and practical solutions and access to IT specialists and knowledge.
The service is aimed at businesses who have heard superfast broadband is coming to their area or are already experiencing good connection speeds and fulfill ERDF eligibility criteria.
Register on their website today to see if your business is able to access the full support package and keep up to date with the latest technologies and information.
w: www.superfastbusiness.co.uk
e: info@superfastbusiness.co.uk
t: 0845 603 8593
Career of the Software Engineer in Modern Open-Source e-Commerce CompanyVrann Tulika
Eugene will talk about the key components of the successful career in software engineering. This will cover various subjects: the landscape of modern IT business: fields, specializations of software; IT departments and roles in big companies; Passing the interview and being a successful employee; Specifics of e-commerce open-source software; Importance of the soft skills for career growth.
Objectivity/DB: A Multipurpose NoSQL DatabaseInfiniteGraph
The speakers will describe the flexible configuration possibilities that Objectivity/DB provides, with an emphasis on how best to distribute data across multiple storage nodes. The session will start by describing the distributed processing architecture of Objectivity/DB before covering the new Placement Manager features. The speakers will also describe how Objectivity/DB compares and contrasts with other NoSQL solutions.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.
Getting Started with Big Data for Business ManagersDatameer
Big Data has become critical to the enterprise because of the massive amount of untapped data sources, and the potential to gain new insights that were previously not possible. So, how to get started with Big Data and Hadoop becomes a question more pertinent than ever before.
Listen to leading analyst at Ovum, Tony Baer, as he discusses answers to the key questions around how to:
Approach Big Data and associated business challenges
-- Identify what types of new insights can be revealed by Big Data
-- Staff for this undertaking and implement the technology necessary to be successful
-- Take the first steps toward getting started with Big Data on Hadoop
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Both digital and traditional businesses are constantly evolving, and the need to move fast is a pervasive reality. Delivering what customers want and need goes beyond the creation of delivery channels. In fact, it relies on the company’s ability to produce, consume, organise, understand, curate, and distribute data.
In this presentation, Dan Aragao and Simon Hope provide a glimpse of the journey ThoughtWorks and REA are currently undergoing to create a truly data-centric, cutting-edge digital business.
What is the value of big data? How does a user get that value?
Before, analysts would have to wait months relying on IT for a new report or make changes to an existing one. Now, analysts are able to shrink that time down to days or even minutes. On top of that, analysts can ask questions that were not possible before. In this webinar, we’ll show you how this analysis is possible and the value that has been achieved by customers.
In this session, you will learn:
How analysts get value out of big data
How to visualize data at every step of analysis
How analysts can do big data analytics without IT, in one product
Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
Companies today are all focused on finding new consumption models to better utilize the data they produce. This presentation will provide insights and best practices for creating the organization and sponsorship necessary to set the foundation for success.
For this session, Dan will provide an overview of the process and methodologies he employs to establish and sustain a Data Driven Culture. Key topics will include:
Data Driven Culture
Executive Sponsorship
Organizational Structure – Collaboration Hubs and Bi-Modal Analytics
Role of Hadoop and Big Data as Part of Data Driven Culture
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Understanding What’s Possible: Getting Business Value from Big Data QuicklyInside Analysis
The Briefing Room with David Loshin and OpenText
Live Webcast April 14, 2015
https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e079dc562543a394c5c5d0588e7cd9152
To be successful and practical in delivering meaningful insights, companies must embrace the three pillars of enterprise analytics: scalability, open standards, and speed to value. In doing so, organizations enable a range of options that can satisfy both data scientists and self-service business users alike. But getting there requires a thoughtful approach -- and some enterprise knowledge of statistical modeling. How can your company stay ahead of the game?
Register for this episode of The Briefing Room to learn from veteran Analyst David Loshin, as he explains why the fundamentals will always apply to the high-stakes game of analytics. He’ll be briefed by Allen Bonde of Actuate, now part of OpenText, who will showcase his company’s intelligence platform, which was designed from the ground up to embrace open standards and was purpose-built to serve large enterprises with a wide range of data needs. He'll demonstrate recent success stories using a number of Big Data sources, including device and machine data.
Visit InsideAnalysis.com for more information.
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Superfast Business - offers fully funded support to help ambitious businesses in the South West with a focus on rural areas identify, maximise and profit from the opportunities that superfast broadband and new technologies present. They have a team of expert advisers, a programme of events on hot topics offering inspirational insights and practical solutions and access to IT specialists and knowledge.
The service is aimed at businesses who have heard superfast broadband is coming to their area or are already experiencing good connection speeds and fulfill ERDF eligibility criteria.
Register on their website today to see if your business is able to access the full support package and keep up to date with the latest technologies and information.
w: www.superfastbusiness.co.uk
e: info@superfastbusiness.co.uk
t: 0845 603 8593
Career of the Software Engineer in Modern Open-Source e-Commerce CompanyVrann Tulika
Eugene will talk about the key components of the successful career in software engineering. This will cover various subjects: the landscape of modern IT business: fields, specializations of software; IT departments and roles in big companies; Passing the interview and being a successful employee; Specifics of e-commerce open-source software; Importance of the soft skills for career growth.
Clare Corthell: Learning Data Science Onlinesfdatascience
Clare Corthell, Data Scientist and Designer at Mattermark, and author of the Open Source Data Science Masters, shares her experience teaching herself data science with online resources. http://datasciencemasters.org/
Presentation on thinking digital and 10 Think Digital ideas by Dave Briggs from WorkSmart. Presented at the Hot Topic event on Building Digital Capability in Bristol on 2 October 2014.
Drinking from the Digital Data Fire HoseGigi Johnson
Digital change expert Dr. Gigi Johnson, Executive Director of the Maremel Institute, will share on April 17, 2014, this webinar with the US Department of Housing and Urban Development's OCIO Learning Sessions online. She will discuss five (5) steps to both grow and simplify how we can use abundant data to make better daily and strategic decisions. She will address questions such as: How can I use the data that I can get now at a reasonable price with reasonable use of time to help my work thrive? How can I find ways to SAVE time and energy around data? How can I have the right data when I need it for decisions? and Can I create systems and structures to make this daunting task a little simpler? These slides share the content planned for this event. More information can be found at http://portal.hud.gov/hudportal/HUD?src=/press/multimedia/videos.
Yes, I still do KM and KM is not dead. I thought I would share the basic deck that I use in workshops that are part of my KM Assessment and Strategy consulting practice. In addition to interviews, surveys, and inventories, it is important during a KM assessment to educate and engage the organization.
Take your digital workplace training to the next level (DWCNZ)Rebecca Jackson
With the pace of change with digital tools it can be hard to keep up and deliver great, relevant training information and content. In this we cover the ‘Digital Workplace’ building blocks, a framework which underpins NEXTDC's digital strategy and connects the end training to company values, and to ways of working (rather than technology).
Gartner predicts that the role of the Citizen Data Scientist will grow 5X faster than its highly trained counterparts (the Data Scientist). Learn more about the rise of this emerging class.
Views From The C-Suite: Who's Big on Big DataPlatfora
he way that big data pervades most organizations today creates a dynamic environment for C-level executives to explore how it can and should be used strategically to add business value.
While each C-level executive views big data through a unique lens, a strong consensus exists among them about the need for effective big data analytics across their organizations.
This Economist Intelligence Unit report shows that senior executives are optimistic about both the capabilities of big data and the impacts such data can have on their businesses.
Download the report to get the whole story.
Driving A Data-Centric Culture: The Leadership ChallengePlatfora
Embracing data as a corporate asset—and a source of competitive advantage—is not just a “good idea” that companies should consider. Such adoption will help determine the winners and losers across multiple markets and industries in the future.
In the last couple of years, corporate focus has shifted: first, from investing in the right technology and tools; then to acquiring the right talent and skills; and now to building the right organizational culture that can realize the business value of powerful big-data analytic tools.
Most organizations today are still focused on putting in place the right technology and talent, but others have evolved further and are working toward fostering a data-centric corporate culture.
Driving A Data-Centric Culture: A Bottom Up OpportunityPlatfora
Big data has captured the attention of business leaders in almost every industry. Building big-data capabilities has found its place on the corporate agenda, and leading companies are moving forward on promoting a data-centric culture.
Most data-driven companies are focused on the leadership challenge of inspiring this cultural shift. To date, however, little has been said about the role of middle management and lower-level employees in spreading and institutionalizing a data-centric culture.
Gain a Holistic View of your Customer's JourneyPlatfora
Today, companies are capturing information about customers at every touchpoint, but the reality is that most companies are working with siloed marketing data because they’re using disparate tools to track online, offline, web, social, mobile, and advertising data.
In this presentation, Rod Fontecilla, VP of Application Modernization at Unisys, explains how his team uses Platfora to analyze, interact and understand data to drive customer success at Unisys.
Rod will highlight three specific Unisys use cases of Platfora, one of which involved a timely text survey sentiment analysis that produced insights enabling a course correction in favor of improved customer satisfaction.
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...Platfora
The proliferation of machine sensors, interaction, and transaction data is driving a significant transformation within the oil and gas industry. Some industry analysts estimate that correctly implementing big data analytics can provide a 4-8% improvement in operational efficiency for oil companies. Other research shows that nearly 90% of oil industry executives rate big data analytics as a top priority, while fewer than a third have implemented solutions.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
3. 3
Topics
• What‘s big data?
• What is my role really about?
• How did I… ?
• If I had a time machine…
• A little more data…
• Where are my girls at?
@platfora
5. 5
Big data is the term for a collection of data
sets so large and complex that it becomes
difficult to process them using traditional
storage and processing applications.
@platfora
6. 6
The 3Vs of Big Data
Volume
Velocity
Variety
Big
Data
Value
@platfora
7. 7
A Brief History of Big Data and Hadoop
The term ‗big
data‘ is coined
Hadoop open source
project founded
Google publishes
GFS/MapReduce
papers
Hadoop becomes top-
level Apache project
Facebook releases
Hive (SQL on Hadoop)
Google founded
750 attend 3rd
Hadoop Summit
Platfora GA Release
Platfora founded
2700 attend 5th
Hadoop Summit
@platfora
8. What is
Hadoop?
8
• Open-source software application framework
• Designed for processing large datasets
• Distributed over multiple commodity servers
• With built-in fault detection and recovery
• Consisting of two key services:
• Distributed file storage (HDFS)
• Distributed data processing (MapReduce)
@platfora
10. How MapReduce Processing Works
10
yellow blue red
orange green blue
blue orange red
green yellow red
yellow blue red
orange green blue
orange,1
green,1
blue,1
yellow,1
blue,1
red,1
blue orange red
green yellow red
green,1
yellow,1
red,1
blue,1
orange,1
red,1
yellow,1
yellow,1
blue,1
blue,1
blue,1
red,1
red,1
red,1
orange,1
orange,1
green,1
green,1
yellow,2
blue,3
red,3
orange,2
green,2
yellow,2
blue,3
red,3
orange,2
green,2
Input Files Split Map Sort / Shuffle Reduce Final Output
@platfora
11. So … Hadoop Is Great!
• It‘s inexpensive
• It‘s scalable
• It‘s powerful
• It makes data consolidation easy
11@platfora
12. • Queries require advanced technical skills
• Business users cannot access the data directly
• Batch processing is too slow for interactive analysis
But … Hadoop Has Its Limitations
12@platfora
14. • Hadoop is a powerful platform for storing and
processing big data
• But it doesn‘t meet all of the data needs of the
enterprise
• Platfora leverages the power of Hadoop and
builds on its strengths
• Platfora makes data in Hadoop accessible to
enterprise business users
• Without the need for separate ETL, Data
Warehouse, and BI Tools
How Platfora
Works with
Hadoop
16. 16
The Platfora Pipeline for BI on Hadoop
source data
Hive
HDFS
source data
1. Connect to source data in Hadoop
define datasets
2. Define and model datasets
3. Build lenses to pull data into Platfora
build visualizations
4. Visually analyze the data
@platfora
21. 21
How we do it
Engineering
(and design) Engineering
(and design)
CONCEPTION
INITIATION
ANALYSIS
DESIGN
CONSTRUCTIO
N
TESTING
DEPLOYMENT
CONCEPTION
INITIATION
ANALYSIS
DESIGN
CONSTRUCTIO
N
TESTING
DEPLOYMENTWaterfall
Model
VS
Agile
@platfora
25. 25
Mission
Lead a cross-functional team
that prides itself on effectively
and efficiently delivering to the
needs of Platfora customers
@platfora
26. 26
What Does a Software Program Manager Do?
• Project Management taken
‗up a level‘
• Lead the cross-functional
team and execution
• Identify what needs to be done &
partner with others to ensure it happens
• Collaboration across functions
• Roles & Responsibilities
• Advocate accountability
• ‗Project‘ health and status
Sometimes…
@platfora
27. 27
What Else Does a Software Program Manager
Do?
• Efficiency and effectiveness
• Continually evaluate and enhance operational performance
• What could we be doing better tomorrow?
• How will this positively impact our customers?
• Product Roadmap
• Translate business objectives into execution strategy
@platfora
28. 28
Keys to a Successful Program Manager
• Excellent Communication skills – at all levels of the org
• Trusted & Respectful Relationships across the org
• Kind, but Pushy – tactfully probe, question and challenge
• Sense of Urgency, Responsibility and Accountability
• Judgment – lead teams to make the right decisions
• Flexibility
• Ability to Influence
• Positive attitude
• Have fun!
29. One Path to Program Management
= acquisition
Post
Communications
35. 35
What it Takes to Work in Tech Pubs
• Strong technical knowledge (or aptitude)
• Excellent writing skills
• Excellent listening, research, and
information-organizing skills
• Sympathy for the impatient or
frustrated user
• Humility and curiosity
• Sense of responsibility
• Resourcefulness and self-sufficiency
@platfora
37. 37
• Use technology and time wisely
• Perception is key
• Show you are engaged and productive
when it is most important
• Telecommuting & Mobility tools **
• Spend time working, not commuting
• WiFi is everywhere
• Didn‘t get to that ‗one last thing‘ before you
left the office – wrap it up during alternate
hours (eg. after the kids are in bed)
** But don‘t underestimate the value of face time
How Do I Balance Work with My Life?
• Figure out what works for YOU and your situation
Sure my project management system
has
a few flaws, but I am sticking with it.
@platfora
38. 38
• Set expectations & Keep boundaries
• Set expectations up front, avoid difficult
discussions after the fact
• Prioritize – it‘s OK if you can‘t do it all at
once
• Beware of the self-imposed guilt trip
• Make time to be ‗offline‘
• Explore your options if you take a leave
• You don‘t have to return to the SAME job
• Ease back in (part-time full-time)
How Do I Balance Work with My Life?
Saying NO feels empowering,
Except for the guilt.
@platfora
39. 39
How Do I Balance Work with My Life?
• Plan ahead, where possible
• Plan the weeks personal
commitments alongside
work calendar
• Block time for standing personal
commitments on your work calendar
• Allow time to establish yourself
before making a personal / life change
• Too many changes all at once can be
overwhelming and impact your work
@platfora
43. 43
So I worked at Nordstrom
sales
associate
assistant
manager
manager
assistant
buyer
regional
buyer
geek transition
@platfora
44. 44
Geek Transition – Getting a Foot in the Door
• Get educated
• Take classes
• Research, read, experiment
• Build a portfolio of work
• Internships
• Open Source projects, Non-profits,
Small businesses
• Network
• Professional organizations
• Social media *
@platfora
45. My (Not so Straight-forward) Career in Technology
= happy accident= acquisition
@platfora
50. Sive‘s Words of Wisdom…
• Don't be afraid to ask questions
• Pay attention to your audience
• Establish a network of trusted
leaders/mentors
@platfora
51. Daria‘s Words of Wisdom…
• Know your value
• Know what drives you
• Know your comfort zone … then push the
boundaries
@platfora
52. Denise‘s Words of Wisdom
• Be open—―feedback is a gift‖
• Manage up, don‘t manage the
message up
• Periodically assess your career
and outstanding tasks as if you
are preparing your legacy
@platfora
54. 54
Leadership
“a process of social influence in which one person can
enlist the aid and support of others in accomplishment of a
common task” [Thank you, Wikipedia!]
@platfora
60. 60
Cognitive Diversity
• “Cognitive diversity goes beyond job function or
titles (where even diverse multifunctional
innovation teams can come together yet still fail
to come up with truly innovative ideas and
development) and gets to a root level
differentiator of the way people look at the world
and how they communicate that vision.”
[Innovation Excellence]
61. Higher return on sales for companies with 3 or more female board
directors
62. Higher return on investment for technology companies with more
women on their management team
64. 64
The fine print…
You must tweet a photo of yourself participating in a women and STEM
program by October 31. In your photo @mention Bay Area Girl Geek
Dinners (@BayAreaGGD) and Platfora (@Platfora).
What do you get…
You‘ll be entered into a drawing to win a free Hadoop training class*
@platfora*Valued up to $2,000.
Volume = Size of the Data The size of the data is relative to the organization as to what “big” means. And the target changes as advances in hardware get better. But if a dataset is growing exponentially (think tweets, sensor data, application logs) and it is too big to fit in memory on a single machine, then it is big data. Usually we’re talking terabytes or petabytes.Velocity = The time to process or query the dataFrom batch processing to real-time to streaming data. For some applications, such as fraud detection, 10 minutes is too late. Data must be used as it streams into your organization. Example: Process 5 million stock trades a day to detect potential fraud. Variety = Big data can be in any format – structured, unstructured – web logs, log files, sensor data, call records. New insights are found when analyzing all of this data together.Value = The unspoken V is value. Is the data useful and accurate and meaningful? Does it provide valid insights? This is something that requires human input – people need access to big data to determine its value.
1997 – The term “big data” is first used in a paper published by Michael Cox and David Ellsworth called “Application-controlled demand paging for out-of-core visualization”.2004 - Google engineers Jeff Dean and Sanjay Ghemawat publish a paper describing Google File System and Google MapReduce, the proprietary technology that allowed Google to do exabyte-scale data management and parallel processing using commodity hardware.2006 – Engineers Doug Cutting and Mike Cafarella develop HDFS and MapReduce to support Yahoo search based on Google’s design and the Hadoop open source project is born. 2008 – Hadoop becomes a top-level Apache project and Facebook releases Hive, a SQL query interface for Hadoop. 2009 – The Hadoop Summit in its third year has 750 attendees. 2011 – Platfora is founded and begins development on a solution to simplify access to Hadoop data for the business users. Attendance at the Hadoop Summit triples in 2 years.2013 – Platfora goes GA and a new era of Big Data is born!
To understand the Platfora, you first need to understand Hadoop. We will briefly cover what Hadoop is, and how it works at a high level. So what is Hadoop?It is a software application framework originally designed by Google in their earlier days so they could usefully index all of the internet data they were collecting, and then present meaningful results to their users. At the time, there was nothing on the market that could do that, so Google built their own platform. Hadoop is an open source project that was founded based on a paper released by Google describing their innovations. Yahoo was a major contributor to Hadoop and played a key role in developing Hadoop for enterprise applications.Hadoop was purpose built for processing large datasets containing complex data that could not easily be described in table format. It was designed to use the disk, memory, and compute resources of multiple commodity servers that are networked together. Because failures are common and expected in distributed computing, the Hadoop framework handles replicating the data across multiple machines so it can continue data processing uninterrupted in the event of a hardware failure. The Hadoop framework consists of 2 key services. The Hadoop Distributed File System (HDFS) handles the storage and replication of the data files, and MapReduce does the data processing on the data where it resides.
These two core services – HDFS and MapReduce – are the foundation of Hadoop. These services work together to provide a storage and processing platform for big data. There are other components in the Hadoop eco-system as well, such as Hive (a SQL-like interface for defining MapReduce jobs) or Pig (a data flow language for defining MapReduce jobs) – but HDFS and MapReduce are the major components of Hadoop.
Here is a simple diagram of how a MapReduce job works using the the famous Hadoop Word Count example (count the occurrences of a word in the input data files). A MapReduce job is divided into two main phases – MAP and REDUCEThe MAP phase splits the files into records that can be worked on in parallel, and passes them to individual mapper tasksThe map tasks process the records into key, value pairs. The maps are then sorted and shuffled to consolidate the matching keysThe REDUCE phase then aggregates the individual map results into a final result.
There is no doubting that Hadoop is very good at what it does – storing and processing large amounts of data of disparate schemas and formats. There is a reason why Hadoop is considered THE platform for Big Data.Inexpensive – Hadoop clusters often cost 50 to 100 times less per-terabyte of storage than a traditional data warehouse. It is open-source and runs on commodity hardware, so many enterprises are drawn to the price/performance ratio without vendor lock-in. Scalable – Hadoop was designed to store and process large amounts of data at scale. Hadoop scales linearly and can grow organically as your data grows. It scales in both storage capacity, and in compute capacity. Along with scaling, it is fault-tolerant in the face of hardware failures. Powerful – Hadoop can do complex data processing tasks on volumes of data that traditional relational database systems can’t handle. It can work on all types of data, structured and unstructured.Consolidation – It is relatively easy to get data into Hadoop. Since Hadoop is a filesystem and not a database, you can store data from multiple sources - structured or unstructured – and combine all different types of data in one platform. This ‘data reservoir’ approach is attractive to many companies looking to eliminate data silos and have one data repository for all the company’s BI and analytics needs.
On the other hand, Hadoop is not good at everything that the enterprise needs to do with their data. Namely, it is not readily accessible to business users.Hard to Query – To query data in Hadoop, you have to know how to write MapReduce programs (i.e have Java programming skills) or use a specialized query language like Hive or Pig. Since the average business user does not have these skills, there is a burden on technical Hadoop experts to write the queries to get at the data and make sense of it. Hadoop requires a technical middle-man between the data and the consumers of the data. Data Access – To store data in Hadoop, you do not need to define the schema or specify any metadata about the data itself. This makes it really easy to get data into Hadoop. However, it makes it hard to know what data you have available to you and how to make use of it. Again, business users have to rely on technical experts to know what data is there and what questions are possible. Slow Batch Processing – Hadoop is powerful but it is limited to batch-oriented processing of data. You cannot query Hadoop in real-time. Many companies use the powerful batch processing capabilities of Hadoop, but then move the data to a different system that can be queried in real-time by their data anlaysts.
To make Hadoop data available to the business, the typical solution today is to Pre-process the data first in Hadoop using MapReduce to organize it into a consistent structureThen move a pre-defined subset of data out of Hadoop into a relational data warehouse (Vertica, Oracle, Greenplum, TeraData, etc) using ETL tools (Informatica, etc.) and scriptsThen connect a BI tool (MicroStrategy, Tableau, etc) to certain tables in the database – often pre-defined aggregate tables purpose built to optimize the BI queries.If there is a change in the source data or a need for more or additional data, this entire pipeline has to be re-worked, which can take months – and again puts the burden on IT
Voice:So, to review: (review the four requirements)Handle big dataAllow iterative, in-memory exploration through big dataCollect everything – we can’t anticipate the questionsRemove the friction and complexity introduced with data warehouses and allow business users to self-service through the dataThe implication. If you are left using a data warehouse or a connector lifeline to legacy technology, you will be left behind
(this is an animated slide – click to advance animation before each talking point)Platfora is an application that optimizes an organizations existing Hadoop cluster(click)Platfora is installed in the same network as an existing Hadoop cluster but on its own dedicated hardware. Platfora connects to Hadoop and utilizes its HDFS and MapReduce services. (click)To make Hadoop data visible and discoverable to business users, Platfora administrators create a data catalog of the datasets residing in Hadoop. The data catalog is just a metadata description of the source data. No data is moved until a user requests it. (click)When a business user finds data they want to explore, they can request it by defining a lens. A lens is a selection of fields chosen from one or more related datasets. (click)To load the requested data into Platfora, users “build” a lens. A lens build initiates a series of MapReduce jobs in Hadoop to pull and process the requested data. The output of the MapReduce job is the lens, which is stored in HDFS and also loaded onto disk in Platfora. (click) Once a lens is built, it can be queried by Platfora’s built-in BI interface – the Vizboard. A lens is a columnar data structure that is loaded into memory to enable real-time interactive queries.This workflow is flexible and iterative – if a business user needs additional data, they can update and refresh their lens on their own.
The workflow to go from raw data in Hadoop to interactive in-memory BI is achieved in one end-to-end platform.(click)Connect Platfora to your Hadoop cluster and point to the source data you want to make available.(click)Describe how the data is structured and related by defining and modeling datasets.(click)Once the catalog is defined, users can request data from Hadoop by defining and building a lens. A lens can be thought of as an on-demand data mart that can be updated or deleted as needed. It is a selection and summarization of the source data, but the source data itself in Hadoop remains intact.(click)Once the data is in lens format, it can be queried by Platfora’s BI interface, the vizboard. Lens data is loaded into memory to enable real-time, interactive data exploration.
Methodologies – Agile, Waterfall, etc.?What does an Eng org look like without a PGM?
Methodologies – Agile, Waterfall, etc.?What does an Eng org look like without a PGM?
Recognize when to ‘steer’ v. ‘mandate’Judgement, anticipating problems and making trade-offs
Just like technology itself – my career has taken twists and turns – ups and downsIn every career, I think there are things that you didn’t expect, things that maybe were a little uncomfortable at the time, but on the flipside you look back and say “that was awesome!” I call these “happy accidents”My first happy accident was that internship that I landed at the web development consulting firm, Lante, which turned into my first job. They never had a technical writer, I had never been a technical writer, but I had to figure out something to do for them! I started by sitting in on customer meetings and listening to what the customer wanted – I wrote requirements docs and use cases – and final documentation at the end of the project. It turned out that customer satisfaction was higher on projects that had documentation, and it led to more repeat engagements. I was hired to lead the business analysis and documentation portion of the practice, and built a team of writers.My second happy accident was Securant. I was hired by a tech pubs manager who quit on my 4th day. He just stopped coming to work. I don’t blame him. It was 4 weeks out from the first release. It was a start-up in complete chaos. There was no documentation at all. I new nothing about web security. I was way over my head! It was horrible! I felt stupid! I had never worked harder in my life! But that experience accelerated my career growth 10 times over. At the time it sucked – looking back – it was awesome! Greenplum was my third happy accident. I was reluctant to go because I did not know anything about big data or databases. I hadn’t been at Oracle very long, and was worried about job hopping so soon. But the VP of Engineering at the time was a woman who had been a mentor to me at 2 prior companies. I decided to go to Greenplum because of her. Then she quit on my 4th day! The initial beta product didn’t work and had to be re-architected. They had to lay off half the company while they regrouped. It was horrible! But in hindsight, it was awesome. I fell in love with big data. I got to be involved at an early stage with a company that went on to be really successful. I got to build a tech pubs and training organization from the ground up. And it was where I first got to work with Ben Werther, the founder of Platfora!
My parents wish I would’ve gone to law school – it would be so much easier to tell their friends that I’m a lawyerI had nothing to do with this!Why yes, I know Word – what is the company again?I write useless instructions that nobody reads, or sometimes I take emails written by Engineers and paste them into Word or FrameMaker or whatever the authoring tool du jour may be. I am an information ninja, managing a complex web of informationActually for about every hour I spend writing, I spend another 4 hours doing other things - like researching competitor products, industry terminology and use cases, using the product, attending meetings, filing bugs, curating sample data, developing demos, everything else…
My mission is simple – people use software to make their job easier. I write information to help people use the software to do their job. Part of that is understanding your audience and writing to their level of experience.
Authoring toolsGraphic development toolsCollaboration toolsCloud and infrastructure tools to install Platfora and Hadoop so we can test and document different use cases.
The word “technical” comes first in technical writer
Personal goals / Life changes – getting married, moving home, having a kid, kid transitions (schools)
Personal goals / Life change examples: marriage, buying a house, moving home, having a kid, kid transitions (schools)
So I am going to talk about how I switched careers and got started in technology
Unlike young people today, I did not grow up with technology – there was no email, no laptops, no cell phones, no facebookI’m a little embarrassed to say that I did not use a computer for the first time until I was in college
In college I was an English major – I was great at organizing my thoughts and could write a killer essayBut like many English majors, I had no idea how to use my skills to make a living
So I got a job at Nordstrom. That job turned into a career, and I worked my way up the chain to an assistant buyer position.I actually decided to change careers when I was offered a promotion – the job I had been coveting for 8 years – BUYER yay! (in Arizona) oh!I had cold feet. If I took that job, I realized that I was giving up what I really wanted to do. I wanted to write, I wanted to learn, and I wanted to make a difference.I wanted to be a geek!
At the time of my transition, it was the dot com boom. I knew I wanted to be a part of that! San Francisco State had opened a new program in technical writing and it was love at first task analysis outline! I also enrolled in their web development program.I read more “Dummies” books than I care to admit. I tried every technology and software I could get my hands on. One problem – you can’t get a job without experience, and you can’t get experience without a job.So I found open source projects that did not have much documentation, and offered to contribute. I did free web sites for small local businesses. I had a friend introduce me to someone she knew who worked for a web development start-up, and offered to intern for free.I joined the Society of Technical Communication (STC) and volunteered at their conferences and judging competitions – the competitions let me get my hands on examples of real-world technical documentation, and meet my more-established peers. At the time there was no social media like we know it today – no LinkedIn or Twitter - but I put up a personal web site with articles and whitepapers about the industry – along with my portfolio pieces and a resume of the projects I had done. The purpose is the same – promote yourself and demonstrate what you know (or what you want people to think you know!).
Just like technology itself – my career has taken twists and turns – ups and downsIn every career, I think there are things that you didn’t expect, things that maybe were a little uncomfortable at the time, but on the flipside you look back and say “that was awesome!” I call these “happy accidents”My first happy accident was that internship that I landed at the web development consulting firm, Lante, which turned into my first job. They never had a technical writer, I had never been a technical writer, but I had to figure out something to do for them! I started by sitting in on customer meetings and listening to what the customer wanted – I wrote requirements docs and use cases – and final documentation at the end of the project. It turned out that customer satisfaction was higher on projects that had documentation, and it led to more repeat engagements. I was hired to lead the business analysis and documentation portion of the practice, and built a team of writers.My second happy accident was Securant. I was hired by a tech pubs manager who quit on my 4th day. He just stopped coming to work. I don’t blame him. It was 4 weeks out from the first release. It was a start-up in complete chaos. There was no documentation at all. I new nothing about web security. I was way over my head! It was horrible! I felt stupid! I had never worked harder in my life! But that experience accelerated my career growth 10 times over. At the time it sucked – looking back – it was awesome! Greenplum was my third happy accident. I was reluctant to go because I did not know anything about big data or databases. I hadn’t been at Oracle very long, and was worried about job hopping so soon. But the VP of Engineering at the time was a woman who had been a mentor to me at 2 prior companies. I decided to go to Greenplum because of her. Then she quit on my 4th day! The initial beta product didn’t work and had to be re-architected. They had to lay off half the company while they regrouped. It was horrible! But in hindsight, it was awesome. I fell in love with big data. I got to be involved at an early stage with a company that went on to be really successful. I got to build a tech pubs and training organization from the ground up. And it was where I first got to work with Ben Werther, the founder of Platfora!
By value I mean being able to articulate the value you bring to a company and knowing what your job and skill-level is worth in the market. Realize that you bring skills and experience to the table when changing careers or when changing roles within the same company. What makes you excited to come to work every day? For me, I have to truly believe in the product or technology I am working on. I need a place where I am learning, and where I feel like my role matters in the success of the company. When you can identify what you need to be happy, you are less likely to to wind up in a job that is a bad fit. The majority of my career growth has come from doing something that has scared me or has made me feel uncomfortable. I remind myself of that when something is hard or I feel like I am not up to a challenge.
Even asking “do you have feedback for me”. I used to write this in my notebook to remember to ask this at 1:1’s.Not just managing the message about what your team is doing or doing well. This is asking for what support looks like or changing the the conversation when you get promoted. Easy to get
The U.S. Department of Labor projects that by 2020, there will be 1.4 million computer specialist job openings. Yet U.S. universities are expected produce only enough qualified graduates to fill 29% of these jobs.
For hiring managers or people that influence hiring or participate in interviews, there are times that you meet someone and say—”wow, there are great but they aren’t a fit for my team or my company!”. [Give personal story about woman at SFDC] Be sure to pass them along to another hiring manager directly or a friend, as many of you know that it is easy to get lost in the hiring machine especially at a big company.