Big data projects often fail to deliver on promises of working out of the box for any data with regular DBAs. Extracting data from legacy systems into big data systems can result in loss of important controls and governance from the original systems around security, access controls, and metadata. It is important to have hoses securely attached before opening the fire hydrant by mapping security, controls, and developing a vocabulary before combining disparate data sources or implementing complex big data technologies.
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
This document discusses three lessons learned from building machine learning systems at Stripe.
1. Don't treat models as black boxes. Early on, Stripe focused only on training with more data and features without understanding algorithms, results, or deeper reasons behind results. This led to overfitting. Introspecting models using "score reasons" helped debug issues.
2. Have a plan for counterfactual evaluation before production. Stripe's validation results did not predict poor production performance because the environment changed. Counterfactual evaluation using A/B testing with probabilistic reversals of block decisions allows estimating true precision and recall.
3. Invest in production monitoring of models. Monitoring inputs, outputs, action rates, score
From Data Analytics to Fast Data IntelligenceTrieu Nguyen
1) How to understand users with Data Analytics ?
2) How to build Real-time Music Recommender System from Data Stream ?
3) How to boost profit with Cross Sale in Real-time ?
Key Ideas to build Fast Data Intelligence Platform from Open Source Tools:
+ Apache Kafka
+ Apache Spark
+ RFX framework
UX Analytics for Data-driven Product DevelopmentTrieu Nguyen
- UX analytics can help companies turn their user data into real products by discovering user interests in real-time.
- Mobile analytics is important because mobile devices are becoming the dominant way users access the web, and big data and analytics are major trends.
- Core KPIs for mobile analytics include users, sessions, events, and other metrics to understand user behavior and how to engage app users.
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...Work-Bench
This document summarizes a statistician's experience working at a healthcare technology startup that uses electronic health record data. It describes how the company initially had just one quantitative scientist but grew its team to include 70 software engineers and 10 quantitative scientists. It discusses how the company cultivated an R culture through internal packages, training, and hiring. It provides examples of when the company uses R for prototyping but implements in other languages for production, when R is used as a long-term solution, and when R and other languages are used in parallel for analysis.
Reactive Reatime Big Data with Open Source Lambda Architecture - TechCampVN 2014Trieu Nguyen
This document discusses using a reactive lambda architecture with open source tools to solve real-time big data problems. It begins by defining big data and explaining that simply having data is not enough - you need to solve the right problems with the right team and tools. It then presents three example problems that could benefit from real-time big data solutions: disaster prediction and response, understanding customers through social media data, and optimizing marketing campaigns in real-time. The document proposes using a reactive lambda architecture along with open source frameworks like Hadoop, Spark, Storm and databases like Redis, HDFS and HBase to build streaming data pipelines and query data in real-time. It demonstrates this through a social media user tracking and personalized recommendations use
Big data projects often fail to deliver on promises of working out of the box for any data with regular DBAs. Extracting data from legacy systems into big data systems can result in loss of important controls and governance from the original systems around security, access controls, and metadata. It is important to have hoses securely attached before opening the fire hydrant by mapping security, controls, and developing a vocabulary before combining disparate data sources or implementing complex big data technologies.
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
This document discusses three lessons learned from building machine learning systems at Stripe.
1. Don't treat models as black boxes. Early on, Stripe focused only on training with more data and features without understanding algorithms, results, or deeper reasons behind results. This led to overfitting. Introspecting models using "score reasons" helped debug issues.
2. Have a plan for counterfactual evaluation before production. Stripe's validation results did not predict poor production performance because the environment changed. Counterfactual evaluation using A/B testing with probabilistic reversals of block decisions allows estimating true precision and recall.
3. Invest in production monitoring of models. Monitoring inputs, outputs, action rates, score
From Data Analytics to Fast Data IntelligenceTrieu Nguyen
1) How to understand users with Data Analytics ?
2) How to build Real-time Music Recommender System from Data Stream ?
3) How to boost profit with Cross Sale in Real-time ?
Key Ideas to build Fast Data Intelligence Platform from Open Source Tools:
+ Apache Kafka
+ Apache Spark
+ RFX framework
UX Analytics for Data-driven Product DevelopmentTrieu Nguyen
- UX analytics can help companies turn their user data into real products by discovering user interests in real-time.
- Mobile analytics is important because mobile devices are becoming the dominant way users access the web, and big data and analytics are major trends.
- Core KPIs for mobile analytics include users, sessions, events, and other metrics to understand user behavior and how to engage app users.
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...Work-Bench
This document summarizes a statistician's experience working at a healthcare technology startup that uses electronic health record data. It describes how the company initially had just one quantitative scientist but grew its team to include 70 software engineers and 10 quantitative scientists. It discusses how the company cultivated an R culture through internal packages, training, and hiring. It provides examples of when the company uses R for prototyping but implements in other languages for production, when R is used as a long-term solution, and when R and other languages are used in parallel for analysis.
Reactive Reatime Big Data with Open Source Lambda Architecture - TechCampVN 2014Trieu Nguyen
This document discusses using a reactive lambda architecture with open source tools to solve real-time big data problems. It begins by defining big data and explaining that simply having data is not enough - you need to solve the right problems with the right team and tools. It then presents three example problems that could benefit from real-time big data solutions: disaster prediction and response, understanding customers through social media data, and optimizing marketing campaigns in real-time. The document proposes using a reactive lambda architecture along with open source frameworks like Hadoop, Spark, Storm and databases like Redis, HDFS and HBase to build streaming data pipelines and query data in real-time. It demonstrates this through a social media user tracking and personalized recommendations use
Cutting through the noise – a digital space to help line managers - Sarah Mof...Intranet Now
This document discusses the development of Arthur, an internal social media platform launched by ACCA in 2013. It was initially aimed at ACCA's 400 line managers to provide a centralized place for news and information. The platform later expanded and was renamed to help people managers by providing curated content on topics like coaching, communication, and motivation. Over time, the platform refined its content personalization, expanded its readership, and strengthened its measurement of user activity and roles. The goal is for the platform to continue cutting through noise and serving as a core information channel for managers.
Beyond Data Discovery: The Value Unlocked by Modern Data ModelingLooker
In this webinar we will discuss Looker’s novel approach to data modeling and how it powers a data exploration environment with unprecedented depth and agility.
Some topics we will cover:
-A new architecture beyond direct connect
-Language-based, git-integrated data modeling
-Abstractions that make SQL more powerful and more efficient
Data Con LA 2020
Description
Coming from a grand belief of data democratization, I believe that in order for any team to be successful collaborators, it has to be data centric and data should be accessible to all.
*To ensure that your non software or software engineering centric team has maximum efficiency, data should be visible, data lake should be accessible.
*Form a database for analytics summaries, talk about the different technologies(SQL, NoSQL) cost of deployment, need, team driven structure. Build an API for this database for external/inter team crosstalk.
*Build analytics and visual layer on top of it. Flask/Django/Node, etc.., to enable the team to have high visibility in their analysis, and to ensure a higher turnaround of data.
*Talk about an easy way of enabling the team to run code, could be local/cloud, JupyterHub is a great way of doing so, talk about the tremendous value added in that and the potential it enables
*Talk about the common tools user for version control/CICD/Coding technologies, etc..
*Finally summarize the value of the mixture of all these tools and technologies in order to ensure the maximum efficiency.
Speaker
Nawar Khabbaz, Rivian, Data Engineer
This document discusses DevOps and how it can accelerate innovation while reducing risk. DevOps combines software development ("Dev") and IT operations ("Ops") practices to shorten the systems development life cycle and provide continuous delivery. It allows organizations to develop and release software faster and more reliably. DevOps reduces risk by decreasing the likelihood and impact of adverse events through practices like continuous integration, automated testing, and infrastructure as code which allow for faster recovery from failures and security issues.
To rephrase an old saying: ‘It takes a village to raise an Analyst.’ Data Analysts and Scientists are working in teams delivering insight and analysis on an ongoing basis. So how do you get the team to support experimentation and insight delivery without ending up in an IT Engineer vs Analyst vs Data Governance war? We present 5 shocking steps to get these teams of people working together with practical, doable steps that can help you achieve data agility. The speaker has decades of hands on and executive management experience in data, analytics, and software development.
See the recording at http://looker.com/learn#ufh-i-225858450-driving-data-democracy-hadoop-amazon-redshift
The Hadoop ecosystem has improved markedly over the past few years. Moreover, MPP databases seem to slot in nicely as complementary tools to map-reduce batch jobs, in that they allow analytics teams to easily query massive structured data sets.
Rex Gibson, Manager of Data Engineering at Knewton and Scott Hoover, Data Scientist at Looker walk through how these pipelines work. They discuss:
- their technology and data stacks
- possible drawbacks to Hadoop + Redshift
- the merits and drawbacks associated with making data processing and querying more “democratic.”
This document discusses the technical architecture work package for the ViBRANT project. It covers hosting architecture, failover and mirroring, providing technical support, multisite integration, a dynamic site registry, measuring and publishing data usage, developing a citation metric, integrating Scratchpads, prioritizing development, code testing, managing training resources, developing a financial model for sustainability, and providing a service level agreement for users.
How the economist with cloud BI and Looker have improved data-driven decision...Looker
This session by The Economist Group, Cloud BI Ltd and Looker explores the challenges of data-driven decision making and how powerful the approach can be. Hear how the solution was implemented quickly and evolved in the cloud and the benefits of being able to see and understand customer preferences through a 360-degree view.
Where is my big data: security, privacy and jurisdictions in the cloudChris Swan
This document summarizes Chris Swan's presentation on big data security, privacy, and jurisdiction in the cloud. The presentation covers Swan's background in technology, defines big data, discusses cloud security concerns and challenges of regulation across jurisdictions. It concludes by suggesting some steps individuals can take to protect their data, such as only using services from providers with strong privacy policies and avoiding services from countries with surveillance laws that compromise privacy.
Watch this webinar in full here: https://buff.ly/2MVTKqL
Self-Service BI promises to remove the bottleneck that exists between IT and business users. The truth is, if data is handed over to a wide range of data consumers without proper guardrails in place, it can result in data anarchy.
Attend this session to learn why data virtualization:
• Is a must for implementing the right self-service BI
• Makes self-service BI useful for every business user
• Accelerates any self-service BI initiative
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
The Briefing Room with Rick van der Lans and Think Big, a Teradata Company
Live Webcast on June 16, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=197f8106531874cc5c14081ca214eaff
Hadoop is arguably one of the most disruptive technologies of the last decade. Once lauded solely for its ability to transform the speed of batch processing, it has marched steadily forward and promulgated an array of performance-enhancing accessories, notably Spark and YARN. Hadoop has evolved into much more than a file system and batch processor, and it now promises to stand as the data management and analytics backbone for enterprises.
Register for this episode of The Briefing Room to learn from veteran Analyst Rick van der Lans, as he discusses the emerging roles of Hadoop within the analytics ecosystem. He’ll be briefed by Ron Bodkin of Think Big, a Teradata Company, who will explore Hadoop’s maturity spectrum, from typical entry use cases all the way up the value chain. He’ll show how enterprises that already use Hadoop in production are finding new ways to exploit its power and build creative, dynamic analytics environments.
Visit InsideAnalysis.com for more information.
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
Long:
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Data architecture
* Functional architecture
* Technology planning assumptions and guidance
Driving Real Insights Through Data ScienceVMware Tanzu
Major changes in industries have been brought about by the emergence of data-driven discoveries and applications. Many organizations are bringing together their data, and looking to drive change. But the ability to generate new insights in real time from a massive sets of data is still far from commonplace.
At this event, data technology experts and data scientists from Pivotal provided the latest business perspective on how data science and engineering can be used to accelerate the generation of new insights.
For information about upcoming Pivotal events, please visit: http://pivotal.io/news-events/#events
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Analytic workload characteristics and constraints
* Data architecture
* Functional architecture
* Tradeoffs between different classes of technology
* Technology planning assumptions and guidance
#strataconf
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0Joakim Lindbom
Corporations are struggling with overly complex systems and system landscapes. DevOps is presented as one piece of the puzzle to go for much leaner and simpler landscapes - all in order to increase the readiness for change and innovation.
The presentation also discusses the the basic thought error behind organising according to Design-Build-Run, which is the basis for most ICT IM outsourcing.
Data technology experts from Pivotal give the latest perspective on how big data analytics and applications are transforming organizations across industries.
This event provides an opportunity to learn about new developments in the rapidly-changing world of big data and understand best practices in creating Internet of Things (IoT) applications.
Learn more about the Pivotal Big Data Roadshow: http://pivotal.io/big-data/data-roadshow
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast October 6, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=9982ad3a2603345984895f279e849d35
Gartner recently placed Big Data in its “trough of disillusionment,” reflective of many leaders’ struggle to prove the value of Hadoop within their organization. While the promise of enhanced data integration and enrichment is obvious, measurable results have remained elusive. This episode of The Briefing Room will outline how to successfully tie Big Data to existing business applications, preventing your next Hadoop project from being another “Big Data letdown.”
Register today to learn from veteran Analyst Dr. Robin Bloor as he discusses the importance of converging enterprise data integration with intelligence and scalability. He’ll be briefed by George Corugedo of RedPoint Global, who will provide concrete examples of how the convergence of scalable cloud platforms, ever-expanding data sources and intelligent execution can turn the Big Data hype into demonstrable business value.
Visit InsideAnalysis.com for more information.
Lewis Crawford's presentation from the BI Boss event in Leeds, focussing on our perspective on Big Data, Big Data projects, what to avoid, and how to make it work for you.
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
10 top BI trends for 2016 – by Panorama
Its all about the insight
Visual perception rules
The learning suggestive system - AI gets real
The data product chain becomes democratized
Cloud (finally)
“Mobile”
Automated data integration
Interned of things data accelerating into reality
Hadoop accelerators are the last chance for Hadoop
Fading of the centralized on–premise DWH
Cutting through the noise – a digital space to help line managers - Sarah Mof...Intranet Now
This document discusses the development of Arthur, an internal social media platform launched by ACCA in 2013. It was initially aimed at ACCA's 400 line managers to provide a centralized place for news and information. The platform later expanded and was renamed to help people managers by providing curated content on topics like coaching, communication, and motivation. Over time, the platform refined its content personalization, expanded its readership, and strengthened its measurement of user activity and roles. The goal is for the platform to continue cutting through noise and serving as a core information channel for managers.
Beyond Data Discovery: The Value Unlocked by Modern Data ModelingLooker
In this webinar we will discuss Looker’s novel approach to data modeling and how it powers a data exploration environment with unprecedented depth and agility.
Some topics we will cover:
-A new architecture beyond direct connect
-Language-based, git-integrated data modeling
-Abstractions that make SQL more powerful and more efficient
Data Con LA 2020
Description
Coming from a grand belief of data democratization, I believe that in order for any team to be successful collaborators, it has to be data centric and data should be accessible to all.
*To ensure that your non software or software engineering centric team has maximum efficiency, data should be visible, data lake should be accessible.
*Form a database for analytics summaries, talk about the different technologies(SQL, NoSQL) cost of deployment, need, team driven structure. Build an API for this database for external/inter team crosstalk.
*Build analytics and visual layer on top of it. Flask/Django/Node, etc.., to enable the team to have high visibility in their analysis, and to ensure a higher turnaround of data.
*Talk about an easy way of enabling the team to run code, could be local/cloud, JupyterHub is a great way of doing so, talk about the tremendous value added in that and the potential it enables
*Talk about the common tools user for version control/CICD/Coding technologies, etc..
*Finally summarize the value of the mixture of all these tools and technologies in order to ensure the maximum efficiency.
Speaker
Nawar Khabbaz, Rivian, Data Engineer
This document discusses DevOps and how it can accelerate innovation while reducing risk. DevOps combines software development ("Dev") and IT operations ("Ops") practices to shorten the systems development life cycle and provide continuous delivery. It allows organizations to develop and release software faster and more reliably. DevOps reduces risk by decreasing the likelihood and impact of adverse events through practices like continuous integration, automated testing, and infrastructure as code which allow for faster recovery from failures and security issues.
To rephrase an old saying: ‘It takes a village to raise an Analyst.’ Data Analysts and Scientists are working in teams delivering insight and analysis on an ongoing basis. So how do you get the team to support experimentation and insight delivery without ending up in an IT Engineer vs Analyst vs Data Governance war? We present 5 shocking steps to get these teams of people working together with practical, doable steps that can help you achieve data agility. The speaker has decades of hands on and executive management experience in data, analytics, and software development.
See the recording at http://looker.com/learn#ufh-i-225858450-driving-data-democracy-hadoop-amazon-redshift
The Hadoop ecosystem has improved markedly over the past few years. Moreover, MPP databases seem to slot in nicely as complementary tools to map-reduce batch jobs, in that they allow analytics teams to easily query massive structured data sets.
Rex Gibson, Manager of Data Engineering at Knewton and Scott Hoover, Data Scientist at Looker walk through how these pipelines work. They discuss:
- their technology and data stacks
- possible drawbacks to Hadoop + Redshift
- the merits and drawbacks associated with making data processing and querying more “democratic.”
This document discusses the technical architecture work package for the ViBRANT project. It covers hosting architecture, failover and mirroring, providing technical support, multisite integration, a dynamic site registry, measuring and publishing data usage, developing a citation metric, integrating Scratchpads, prioritizing development, code testing, managing training resources, developing a financial model for sustainability, and providing a service level agreement for users.
How the economist with cloud BI and Looker have improved data-driven decision...Looker
This session by The Economist Group, Cloud BI Ltd and Looker explores the challenges of data-driven decision making and how powerful the approach can be. Hear how the solution was implemented quickly and evolved in the cloud and the benefits of being able to see and understand customer preferences through a 360-degree view.
Where is my big data: security, privacy and jurisdictions in the cloudChris Swan
This document summarizes Chris Swan's presentation on big data security, privacy, and jurisdiction in the cloud. The presentation covers Swan's background in technology, defines big data, discusses cloud security concerns and challenges of regulation across jurisdictions. It concludes by suggesting some steps individuals can take to protect their data, such as only using services from providers with strong privacy policies and avoiding services from countries with surveillance laws that compromise privacy.
Watch this webinar in full here: https://buff.ly/2MVTKqL
Self-Service BI promises to remove the bottleneck that exists between IT and business users. The truth is, if data is handed over to a wide range of data consumers without proper guardrails in place, it can result in data anarchy.
Attend this session to learn why data virtualization:
• Is a must for implementing the right self-service BI
• Makes self-service BI useful for every business user
• Accelerates any self-service BI initiative
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
The Briefing Room with Rick van der Lans and Think Big, a Teradata Company
Live Webcast on June 16, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=197f8106531874cc5c14081ca214eaff
Hadoop is arguably one of the most disruptive technologies of the last decade. Once lauded solely for its ability to transform the speed of batch processing, it has marched steadily forward and promulgated an array of performance-enhancing accessories, notably Spark and YARN. Hadoop has evolved into much more than a file system and batch processor, and it now promises to stand as the data management and analytics backbone for enterprises.
Register for this episode of The Briefing Room to learn from veteran Analyst Rick van der Lans, as he discusses the emerging roles of Hadoop within the analytics ecosystem. He’ll be briefed by Ron Bodkin of Think Big, a Teradata Company, who will explore Hadoop’s maturity spectrum, from typical entry use cases all the way up the value chain. He’ll show how enterprises that already use Hadoop in production are finding new ways to exploit its power and build creative, dynamic analytics environments.
Visit InsideAnalysis.com for more information.
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
Long:
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Data architecture
* Functional architecture
* Technology planning assumptions and guidance
Driving Real Insights Through Data ScienceVMware Tanzu
Major changes in industries have been brought about by the emergence of data-driven discoveries and applications. Many organizations are bringing together their data, and looking to drive change. But the ability to generate new insights in real time from a massive sets of data is still far from commonplace.
At this event, data technology experts and data scientists from Pivotal provided the latest business perspective on how data science and engineering can be used to accelerate the generation of new insights.
For information about upcoming Pivotal events, please visit: http://pivotal.io/news-events/#events
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Analytic workload characteristics and constraints
* Data architecture
* Functional architecture
* Tradeoffs between different classes of technology
* Technology planning assumptions and guidance
#strataconf
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0Joakim Lindbom
Corporations are struggling with overly complex systems and system landscapes. DevOps is presented as one piece of the puzzle to go for much leaner and simpler landscapes - all in order to increase the readiness for change and innovation.
The presentation also discusses the the basic thought error behind organising according to Design-Build-Run, which is the basis for most ICT IM outsourcing.
Data technology experts from Pivotal give the latest perspective on how big data analytics and applications are transforming organizations across industries.
This event provides an opportunity to learn about new developments in the rapidly-changing world of big data and understand best practices in creating Internet of Things (IoT) applications.
Learn more about the Pivotal Big Data Roadshow: http://pivotal.io/big-data/data-roadshow
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast October 6, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=9982ad3a2603345984895f279e849d35
Gartner recently placed Big Data in its “trough of disillusionment,” reflective of many leaders’ struggle to prove the value of Hadoop within their organization. While the promise of enhanced data integration and enrichment is obvious, measurable results have remained elusive. This episode of The Briefing Room will outline how to successfully tie Big Data to existing business applications, preventing your next Hadoop project from being another “Big Data letdown.”
Register today to learn from veteran Analyst Dr. Robin Bloor as he discusses the importance of converging enterprise data integration with intelligence and scalability. He’ll be briefed by George Corugedo of RedPoint Global, who will provide concrete examples of how the convergence of scalable cloud platforms, ever-expanding data sources and intelligent execution can turn the Big Data hype into demonstrable business value.
Visit InsideAnalysis.com for more information.
Lewis Crawford's presentation from the BI Boss event in Leeds, focussing on our perspective on Big Data, Big Data projects, what to avoid, and how to make it work for you.
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
10 top BI trends for 2016 – by Panorama
Its all about the insight
Visual perception rules
The learning suggestive system - AI gets real
The data product chain becomes democratized
Cloud (finally)
“Mobile”
Automated data integration
Interned of things data accelerating into reality
Hadoop accelerators are the last chance for Hadoop
Fading of the centralized on–premise DWH
There are 250 Database products, are you running the right one?Aerospike, Inc.
This webinar discusses choosing the right database for organizations. It will cover industry trends driving data and database evolution, real-world use cases where speed and scale are important, and an architecture overview. Speakers from Forrester and Aerospike will discuss how new applications are challenging traditional databases and how Aerospike's in-memory database provides extremely high performance for large-scale, data-intensive workloads. The agenda includes an industry overview, tips for choosing a database, how data has evolved, examples where low latency is critical, and a question and answer session.
A Connected Data Landscape: Virtualization and the Internet of ThingsInside Analysis
The Briefing Room with Dr. Robin Bloor and Cisco
Live Webcast March 3, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=a75f0f379405de155800a37b2bf104db
Data at rest, data in motion - regardless of its trajectory, data remains the lifeblood of today's information economy. But finding a way to bridge old systems with new opportunities requires an innovative data strategy, one that takes advantage of multiple processing technologies. With the optimal architecture in place, companies can harness years of work in traditional information systems, while opening the door to the flood of new data sources available.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor, as he explains how data virtualization and other data technologies fundamentally change what's possible with data access, movement and analysis. He'll be briefed by David Besemer of Cisco, who will discuss how this new kind of data strategy can enable the integration of legacy systems, Cloud computing and the Internet of Things. He'll also answer questions about how Big Data and the IoT are helping to redefine the practice of data management.
Visis InsideAnalysis.com for more information.
S ba0881 big-data-use-cases-pearson-edge2015-v7Tony Pearson
IBM is a market leader in big data and analytics solutions. This session explains the basics of Big Data, with actual use cases of clients who have benefited from IBM solutions in this space, followed by architectures with IBM BigInsights, BigSQL, Platform Symphony and Spectrum Scale.
Fri benghiat gil-odsc-data-kitchen-data science to dataopsDataKitchen
This document outlines seven steps for transitioning from data science to data operations (DataOps):
1. Orchestrate the data science and production workflows.
2. Add testing at each step to monitor quality.
3. Use a version control system to manage code changes.
4. Implement branching and merging to allow parallel development.
5. Maintain separate environments for experiments, development and production.
6. Containerize components and practice environment version control.
7. Parameterize processes to increase flexibility and reuse.
This document outlines seven steps for transitioning from data science to data operations (DataOps):
1. Orchestrate the data science and production workflows.
2. Add testing at each step to monitor quality.
3. Use a version control system to manage code changes.
4. Implement branching and merging to allow parallel development.
5. Maintain separate environments for experiments, development and production.
6. Containerize components and practice environment version control.
7. Parameterize processes to increase flexibility and reuse.
WSO2Con USA 2015: Keynote - Helping You Connect the WorldWSO2
The document discusses Sanjiva Weerawarana, founder and CEO of WSO2, and his vision for the company. It summarizes that Weerawarana thinks long-term and aims to build a comprehensive middleware platform, not focus on hype. It also outlines WSO2's product strategy updates to support microservices, containers, cloud, analytics, mobile/IoT, and consumerization of IT through a series of new and updated products.
The Analytic Platform: Empowering the Business NowInside Analysis
The Briefing Room with Dr. Robin Bloor and Actuate
Live Webcast on October 7, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=475312d15f46d095797f5842de84925f
As businesses grapple with more and more data, analysts and data consumers have a growing expectation to get at those assets fast. All too often, business users are stymied by governance and performance roadblocks, making time-to-insight a relatively slow process. One solution is to leverage the power of an analytic platform, one that keeps data management in IT’s hands, and lets business analysts jump right in without the need for modeling and provisioning.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains the principles behind a meaningful analytic platform. He’ll be briefed by Peter Hoopes and Allen Bonde of Actuate, who will tout their company’s BIRT Analytics, a solution that combines columnar database technology with pre-built algorithms and puts analytics in the hands of the business user in minutes, not days. They will show how their platform makes it easy to perform complex analytics on enterprise data and visualize results, without slowing down other systems or interfering with governance needs.
Visit InsideAnlaysis.com for more information.
Similar to Big data debunking some of the myths (20)
LNETM - Atsign - Privacy with Personal Data ServicesChris Swan
London Enterprise Technology Meetup (LNETM) presentation on Atsign's atPlatform, which uses personal data services (PDS) and end-end encryption to build privacy preserving applications for everybody, every organisation and everyTHING.
SOOCon24 - Showing that you care about security - OpenSSF ScorecardsChris Swan
Open Source Security Foundation (OpenSSF) Scorecards provide a way for open source users to determine whether maintainers are being diligent about securing their link in the software security supply chain. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
This presentation will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across and organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdfChris Swan
Open Source Security Foundation (OpenSSF) Scorecards provide a way for open source users to determine whether maintainers are being diligent about securing their link in the software security supply chain. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
This presentation will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across and organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Fluttercon Berlin 23 - Dart & Flutter on RISC-VChris Swan
Arm has dominated the mobile space since the dawn of smartphones, but systems based on the open source RISC-V instruction set architecture will bring new choices for manufacturers and us, their customers. RISC-V SDKs showed up in the Dart dev channel in Apr 22, but it's still pretty hard to build stuff due to lots of missing dependencies. As always happens with new stuff, the hardware people are waiting for broader software support, and the software people are waiting for a larger hardware installed base. This talk examines the forces that are driving RISC-V forward, and what developers can expect from a world that will have RISC-V devices, mobile phones, tablets and cloud services.
QConNY 2023 - Implementing OSSF Scorecards Across an OrganisationChris Swan
Open Source Security Foundation (OpenSSF) Scorecards provide a way for open source users to determine whether maintainers are being diligent about securing their link in the software security supply chain. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
This presentation will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across and organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and FlutterChris Swan
Walkthrough of how Internet of Things (IoT) devices can run full stack Dart and connect to Flutter apps using end to end encryption to provide security and privacy.
Dart's popularity has surged in the past few years, as it's the language behind Flutter - Google's cross platform front end framework. That's now driving a notion of 'Full Stack Dart', where if you've spent time learning Dart for the front end, why not also use it for the back end.
London IoT Meetup Sep 2022 - End to end encrypted IoTChris Swan
Your thing, your data.
An overview of why end-end encryption is desirable for the Internet of Things (IoT), and how it can be done using personal data stores such as atSigns on the atPlatform.
Flutter Vikings 2022 - End to end IoT with Dart and FlutterChris Swan
Things need apps to manage them, which Flutter is great for, providing an easy way to build cross platform support. But things also need to get their data (securely and privately) to their apps, and Dart can be used for that. This presentation will walk through a use case demonstrated at Mobile World Congress (and now open sourced) that uses Dart to read sensor data through to Flutter for user presentation.
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?Chris Swan
As a hacker and engineer I've been interested in identity and privacy since the dawn of the Internet and the online services it's enabled. For the past year I've been helping to build and open source The @ Platform, which inverts the usual model by giving everybody (and every thing) their own place to store data and control who (and what) has access to it. This talk will give an overview of the platform and its underlying protocol, and illustrate how it can be used to build privacy preserving apps and Internet connected things. It will also cover how the platform can be self hosted on devices like the Raspberry Pi, and how people can get involved in the open source community growing around it.
Devoxx UK 2022 - Application security: What should the attack landscape look ...Chris Swan
What do we need to do in the next few years to ensure that the attack landscape for 2030 isn't the same as 2020? Better languages and frameworks have already brought substantial improvements in memory safety, eliminating whole classes of vulnerabilities caused by buffer overflows.Yet despite a major reshuffle in 2021, the OWASP top 10 remains full of things that boil down to a lack of input validation. An issue that has bedevilled tech since its inception. We're all told that we shouldn't trust the input to our programs, and that validation is our best defence. But developers get precious little help on that front from today's languages and frameworks; something that can and should change. This talk will examine a hypothetical evolution of TypeScript - ValidScript, to consider a future where input validation is baked in.
Flutter Festival London 2022 - End to end IoT with Dart and FlutterChris Swan
A walk through of a demo system that was built for Mobile World Congress 2022 showing how Dart can be used to read data from a biometric sensor and send it to a Flutter front end application using end to end encryption.
Full Stack Squared 2022 - Power of Open SourceChris Swan
The document discusses the power of open source software and how people can get involved. It begins with an introduction of the author and covers the three types of "free" that define open source - free like beer meaning no cost, free like speech meaning freedom over the code, and free like puppy meaning ongoing maintenance is required. Famous people in open source like Richard Stallman, Eric Raymond, and Linus Torvalds are profiled. The document outlines how readers can get involved through contributing code, being considerate of maintainers, and participating in challenges. It concludes with contact information and a call for questions.
Flutter provides an excellent way to build Android, iOS, web and desktop apps, but what about the back end services? Full stack Dart is all about using that investment in Dart programming to build the services used by applications, whether it's in the cloud or on the Internet of Things. This presentation will look at the tradeoffs between just in time (JIT) and ahead of time (AOT) compilation, Dart on Docker, the Functions Framework for Dart, Profiling and Performance Management. Choices of back end architecture (x86_64 vs Arm) will also be examined, along with some of the challenges this can present for Continuous Delivery.
Why Dart?
Language features
JIT vs AOT
Dart on Docker
Functions Framework for Dart
Profiling and performance management
Other places you can learn more
Call to action - try out the Functions Framework Examples
This document summarizes a Raspberry Pi Sous Vide project that has been running for over 8 years. It details the project's longevity with stats on uptime, logs, and failed hardware components like temperature sensors and SD cards over time. The software has also evolved, including upgrades to the Raspberry Pi OS, changes to key dependencies, and a rewrite from Python 2 to Python 3. More details on the long-running project can be found online at the provided URL.
Dart on Arm - Flutter Bangalore June 2021Chris Swan
Running Dart on Arm servers, covering the trade offs between JIT and AOT. The dependencies needed for building and running AOT binaries, and how to cross compile Arm binaries.
The RC2014 system is built around a Z80 CPU, but is open and flexible enough to be used with alternatives. The presentation walks through a project to use Texas Instruments' TMS99xx parts, through to running 'Hello World' in BASIC and Forth.
The document contains summaries of several short talks or presentations on various topics such as ethics in technology, data bias, climate change, and social impact. The summaries are represented visually through maps or models linking different stages of product or service development to relevant approaches, tools, or considerations for each topic. Overall the document demonstrates using maps or models as a concise way to summarize key points that would be discussed in short talks.
DevSecOps Days London - Teaching 'Shift Left on Security'Chris Swan
Deck with backup screenshots of live demo of DevOps Dojo Yellow belt module 'Shift Left on Security' where students incorporate the OWASP dependency checking into a Jenkins CD pipeline around the Springboot Pet Clinic app.
The Rising Future of CPaaS in the Middle East 2024Yara Milbes
Explore "The Rising Future of CPaaS in the Middle East in 2024" with this comprehensive PPT presentation. Discover how Communication Platforms as a Service (CPaaS) is transforming communication across various sectors in the Middle East.
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Preparing Non - Technical Founders for Engaging a Tech AgencyISH Technologies
Preparing non-technical founders before engaging a tech agency is crucial for the success of their projects. It starts with clearly defining their vision and goals, conducting thorough market research, and gaining a basic understanding of relevant technologies. Setting realistic expectations and preparing a detailed project brief are essential steps. Founders should select a tech agency with a proven track record and establish clear communication channels. Additionally, addressing legal and contractual considerations and planning for post-launch support are vital to ensure a smooth and successful collaboration. This preparation empowers non-technical founders to effectively communicate their needs and work seamlessly with their chosen tech agency.Visit our site to get more details about this. Contact us today www.ishtechnologies.com.au
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
2. copyright 2015
Agenda
• My background
• What do I mean by big data?
• Know your algorithm
• Know your data
• Performance
3. copyright 2015
My background
CTO
CTO Client Experience
Co-head CTO Security
Corporate Finance
fintech, early stage
IT R&D – Networks and security
Grid, app server engineering
Combat System Engineer
5. copyright 2015
Misquoting Roger Needham
Whoever thinks their analytics
problem is solved by big data,
doesn’t understand their analytics
problem and doesn’t understand
big data
5
7. copyright 2015
Overview
7
Based on a blog post from April 2012 – http://is.gd/swbdla
Problem Types
Algorithm Complexity
DataVolume
Simple
Big Data
Quant
9. copyright 2015
Quant Problems
9
Any data volume, high algorithm complexity
Problem Types
Algorithm Complexity
DataVolume
Simple
Big Data
Quant
10. copyright 2015
Big Data Problems
10
High data volume, low algorithm complexity
Problem Types
Algorithm Complexity
DataVolume
Simple
Big Data
Quant
Types of Big Data Problem:
1. Inherent
2. More data gives better
result than more complex
algorithm
11. copyright 2015 11
Good
- Lots of new tools, mostly open source
Bad
- Term being abused by marketing departments
Ugly
- Can easily lead to over reliance on systems that lack transparency and ignore specific data points
'Computer says no', but nobody can explain why
The good, the bad and the ugly of Big Data
23. copyright 2015
Don’t agonise over distros
The performance of Hadoop distros
are all the same to within 1 server
within a cluster
Stefan Groschupf
One of the creators of Hadoop