Data Culture Series - Keynote - 16th September 2014Jonathan Woodward
Big data. Small data. All data. You have access to an ever-expanding volume of data inside the walls of your business and out across the web. The potential in data is endless – from predicting election results to preventing the spread of epidemics. But how can you use it to your advantage to help move your business forward?
Drive a Data Culture within your organisation
Synapse is a solution provider with an innovative alternative to commercial off-the-shelf IT applications. Empowering business professionals to shape business processes without being chained to IT applications.
On Friday, September 25th Devin Hopps lead us through a presentation on an Introduction to Big Data and how technology has evolved to harness the power of Big Data.
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Jen Stirrup
The document discusses visualizing big data with tools like Hadoop, Hive, and Excel 2013. It provides an overview of big data technologies and data visualization with Office 365 and Power BI. It describes what Hive is and how it works, including how Hive solves the problem of analyzing large amounts of data by providing a SQL-like language (HiveQL) to query data stored in Hadoop and translating queries to MapReduce jobs. The document demonstrates visualizing big data with Microsoft tools like Power View and Power Map in Excel.
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationSnapLogic
In this webinar, we talk to industry analyst, author and practitioner David Linthicum who provides a state-of-the-technology explanation of big data integration.
David also provides 5 critical and lesser known data integration requirements, how to understand today's requirements, and guidance for choosing the right approaches and technology to solve these problems.
To learn more, visit: www.snaplogic.com/big-data
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
In this first course of our Applied Data Science online course series, you'll learn about the mindset shift of going from small to big data, basic definitions and concepts, and an overview of the data science workflow.
This document summarizes a presentation on open data use and reuse. The presentation discusses the speaker's experience working with open data, including analyzing customs data and creating dashboards. It emphasizes that data can provide insights beyond individual stories by looking at broader trends. The speaker advocates improving data literacy and collection to promote more data-driven decision making and participatory governance. The goal is to get more people engaged in open data through hands-on projects and making the work fun and approachable.
Data Culture Series - Keynote - 16th September 2014Jonathan Woodward
Big data. Small data. All data. You have access to an ever-expanding volume of data inside the walls of your business and out across the web. The potential in data is endless – from predicting election results to preventing the spread of epidemics. But how can you use it to your advantage to help move your business forward?
Drive a Data Culture within your organisation
Synapse is a solution provider with an innovative alternative to commercial off-the-shelf IT applications. Empowering business professionals to shape business processes without being chained to IT applications.
On Friday, September 25th Devin Hopps lead us through a presentation on an Introduction to Big Data and how technology has evolved to harness the power of Big Data.
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Jen Stirrup
The document discusses visualizing big data with tools like Hadoop, Hive, and Excel 2013. It provides an overview of big data technologies and data visualization with Office 365 and Power BI. It describes what Hive is and how it works, including how Hive solves the problem of analyzing large amounts of data by providing a SQL-like language (HiveQL) to query data stored in Hadoop and translating queries to MapReduce jobs. The document demonstrates visualizing big data with Microsoft tools like Power View and Power Map in Excel.
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationSnapLogic
In this webinar, we talk to industry analyst, author and practitioner David Linthicum who provides a state-of-the-technology explanation of big data integration.
David also provides 5 critical and lesser known data integration requirements, how to understand today's requirements, and guidance for choosing the right approaches and technology to solve these problems.
To learn more, visit: www.snaplogic.com/big-data
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
In this first course of our Applied Data Science online course series, you'll learn about the mindset shift of going from small to big data, basic definitions and concepts, and an overview of the data science workflow.
This document summarizes a presentation on open data use and reuse. The presentation discusses the speaker's experience working with open data, including analyzing customs data and creating dashboards. It emphasizes that data can provide insights beyond individual stories by looking at broader trends. The speaker advocates improving data literacy and collection to promote more data-driven decision making and participatory governance. The goal is to get more people engaged in open data through hands-on projects and making the work fun and approachable.
"Don’t worry about people stealing an idea. If it’s original, you will have to ram it down their throats.” Howard Aiken, Founder of Harvard’s Computing Science Program.
Data is moving so fast these days, and there is a shift whereby people are paying for value, not technology. This is where cloud computing comes in: it is very empowering, because anyone with an internet connection can access it. With Power BI in the cloud, small businesses are liberated with the ability to use the same tools and techniques to explore ideas as larger organisations.
In this session, we will look at understanding the Power BI components and tools available in the cloud, including the Power BI Admin Center, Power Query, Power Pivot, Power View and Power Map. We will look at how to use them will accelerate ideas and help to clarify decisions, and related to this, discuss the roles within IT and the business in relation to these tools. We will also look at business puzzles versus business mysteries, a definition evoked by Malcolm Gladwell (Blink, Outliers) in relation to Power BI.
“Out there in some garage is an entrepreneur who’s forging a bullet with your company’s name on it,” said Gary Hamel, a management guru. With Power BI, let’s see how you can translate your ideas in to a message that people can see, using cloud as an empowerment tool.
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...Dataiku
This document discusses the challenges faced by a data team manager named Hal in developing a data science software platform for his company. It describes Hal's background in technical fields like functional programming. It then outlines some of the disconnects Hal experienced in determining the appropriate technologies, hiring the right people, accessing needed data, and involving product teams. The document provides suggestions for how Hal can find solutions, such as taking a polyglot approach using open source technologies, creating an API culture, and focusing on solving big business problems to gain support.
What Does Big Data Really Mean for Your Business?All Things Open
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Leslie Hawthorn
Director of Developer Relations for Elasticsearch
Big Data
What Does Big Data Really Mean for Your Business?
The document discusses the role of humans in an era of big data and machine learning. It outlines that humans are needed to tag data to help machines understand it, and that crowdsourcing is one way to obtain tagged data at scale. The presentation also covers how the human-in-the-loop paradigm involves humans actively training machine learning models through techniques like active learning.
Sql rally amsterdam Aanalysing data with Power BI and HiveJen Stirrup
Analyzing Data with Power View (Level 100)
Jen Stirrup
Come learn about the best ways to present data to your Business Intelligence data consumers, and see how to apply these principles in Power View, Microsoft's data visualization tool. Using demos, we will investigate Power View based on current cognitive research around data visualization principles from such experts as Stephen Few, Edware Tufte, and others. We will then examine how data can be analyzed with Power View and look at where Power View is supplemented by other parts of the Microsoft Business Intelligence stack.
Viet-Trung Tran presents information on big data and cloud computing. The document discusses key concepts like what constitutes big data, popular big data management systems like Hadoop and NoSQL databases, and how cloud computing can enable big data processing by providing scalable infrastructure. Some benefits of running big data analytics on the cloud include cost reduction, rapid provisioning, and flexibility/scalability. However, big data may not always be suitable for the cloud due to issues like data security, latency requirements, and multi-tenancy overhead.
This document provides an overview of big data and how it can be used to forecast and predict outcomes. It discusses how large amounts of data are now being collected from various sources like the internet, sensors, and real-world transactions. This data is stored and processed using technologies like MapReduce, Hadoop, stream processing, and complex event processing to discover patterns, build models, and make predictions. Examples of current predictions include weather forecasts, traffic patterns, and targeted marketing recommendations. The document outlines challenges in big data like processing speed, security, and privacy, but argues that with the right techniques big data can help further human goals of understanding, explaining, and anticipating what will happen in the future.
This document provides an overview of big data, including what it is, how much data is generated every minute, the characteristics and challenges of big data, technologies used like Hadoop and MapReduce, how big data is stored, selected, and processed, applications in various industries, big data analytics, and benefits. It also discusses the future growth of big data and need for data scientists and analysts to support big data.
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
Choosing a data visualization tool is like being a barista serving coffee: everyone wants their data, their way, personalized, fast, and perfect. Many organizations have a cottage industry of data visualization tools, and it's difficult to know what tool to use, and when. Different tools exist in different departments, and if it doesn't meet the user requirements, the default position is to go back to Excel and move the data around there.
This session will examine data visualization tools such as SSRS Excel, Tableau, QlikView, Datazen, Kibana and PowerBI, in order to craft and blend your data visualization tools to serve your data customers better.
This document discusses big data, including its key components and trends. It defines big data using the four V's: volume, velocity, variety, and veracity. The evolution of computing technologies like storage, processors, networks, and data centers enabled the collection of large amounts of diverse data that is generated and needs to be analyzed quickly. Components of big data systems include data storage, processing, management, analytics, and visualization tools. Leaders in big data include Facebook, Amazon, Netflix, Google, and others. Emerging trends discussed are Hadoop becoming mainstream, growth of cloud applications, and the integration of IoT, cloud, and big data.
Big Data Analytics with Qlik & Splunk, Qlik QonnectionsGeralyn Maloney
This document discusses big data analytics using Qlik and Splunk. It provides an overview of Splunk, describing it as a tool that indexes and makes data searchable as long as it has a time stamp. It then discusses some strengths and weaknesses of Splunk, including its capabilities for large-scale data indexing and search but weaker interactive visualization. The document proposes integrating Qlik and Splunk by building a custom connector to stream data directly from Splunk into Qlik's in-memory model for improved interactive visualization, slicing and dicing of real-time data. Screenshots of a prototype Splunk-Qlik connector are provided.
This document discusses the transition from traditional business intelligence (BI) to big data. It notes that BI focuses on structured transactional data and answering questions about the past, while big data leverages both structured and unstructured behavioral data from diverse sources to answer questions about the future. The document outlines technologies like Hadoop, NoSQL databases, and cloud computing that enable organizations to capture and analyze large, dynamic datasets. It also discusses the roles of data scientists and new types of visualizations and devices that support deriving insights from big data.
Snowplow had our debut at the Data Science Festival in London this April. It was a good chance for us to engage with the data science community and learn more about the important work data scientists are doing and how Snowplow best can support this work. We definitely learned a lot and would like to thank everyone who made it by our booth for a chat.
Alex, Snowplow’s Co-Founder and CEO, held a lightning talk on machine learning in real-time. He is sharing a warning from the past and offer some suggestions and design constraints to not repeat the mistakes when it comes to building out your real-time ML capabilities.
This document provides an agenda for the CITA'15 Workshop held in August 2015. The workshop schedule includes 4 sessions taking place between 8:30 am and 5:00 pm with morning and afternoon breaks. The workshop agenda covers topics such as big data analytics, open data, semantic data description using ontologies and RDF, and a case study on converting a dataset to linked open data. The format of the workshop will be interactive with exercises and discussion encouraged.
This document discusses the rise of Hadoop and big data analytics skills needed for developers. It notes that Hadoop provides a scalable platform for distributed processing of all types of data in any format. It has become a universal data platform for enterprises. Developers now need skills in distributed systems, machine learning, and SQL-on-Hadoop tools. Both traditional data warehousing skills and new skills in Java, Scala, Python and distributed processing are important for software developers to have as big data becomes pervasive.
This document discusses big data, defining it as data that is too large and complex for traditional data processing systems due to its volume, variety and velocity. It outlines the 3Vs of big data - volume, referring to the large amount of data being generated daily; variety, referring to different data formats; and velocity, referring to the speed at which data is generated and needs to be processed. The document also discusses characteristics of big data like structured, semi-structured and unstructured data, benefits of big data, challenges of capturing, storing, analyzing and presenting big data, and technologies like Hadoop and MapReduce used for big data solutions.
Think Big - How to Design a Big Data Information ArchitectureInside Analysis
Exploratory Webcast for the Big Data Information Architecture Research Project
Live Webcast Jan. 22, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=32304b307fc5359a2f97b173166ea07b
Big Data is everywhere -- that's for sure. But the big question for today's savvy enterprise is where, exactly, should it fit within the Information Architecture? Making that decision correctly can save a lot of money while adding significant value to any number of enterprise operations. Business processes can be improved with critical new data sets; marketing can excel at hitting the right targets quickly; sales can hit home runs by having a much deeper understanding of key prospects; and senior executives can see the big picture more clearly than ever before.
Register for this Exploratory Webcast to hear veteran Analyst Dr. Robin Bloor outline the current landscape of Big Data, and offer guidance for today's organizations to determine how, when and where to deploy this powerful if unwieldy information asset. This event will kick off The Bloor Group's Interactive Research Report for 2014 which will focus on illuminating optimal Big Data Information Architectures. The series will include a dozen interviews with today's Big Data visionaries, plus three interactive Webcasts and a detailed findings report.
Visit InsideAnalysis.com for more information.
This document summarizes a presentation about Spring, Querydsl, and MongoDB. It introduces Spring and Spring Data frameworks, which make it easier to build Java applications and access data. It also describes Querydsl, a query building tool that works with Spring Data. The presentation demonstrates how to use Spring Data and Querydsl with MongoDB, a non-relational database, to build applications that can query and retrieve data from MongoDB in a type-safe way. Examples of building queries, entities, and repositories are provided.
This document provides an overview of Big Data training. It defines key concepts like volume, velocity, variety and veracity in Big Data. It discusses how Big Data is growing exponentially in terms of content, videos watched, and people online. It then introduces Hadoop, an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. Key components of Hadoop like HDFS and MapReduce are explained. The document concludes with a discussion of Hadoop distributions and demonstrations of Cloudera, Cassandra and MongoDB.
Data Driven: The Ancestry.com Journey to Self-Service AnalyticsWilliam Yetman
The document summarizes Ancestry.com's journey to self-service analytics using Tableau. It discusses the challenges with their traditional BI tool, how they evaluated Tableau and other options, and how adopting Tableau helped overcome reporting bottlenecks. Key successes with Tableau included a Mother's Day PR campaign that was their most talked about and successful campaign, and allowing their A/B testing team to complete 40 requests for analysis in 3 days using a Tableau dashboard. Their vision for the future includes expanding Tableau usage to additional departments and data sources.
Tableau Lunch and Learn in SLC on 6-10-2014 (Bill Yetman and Adam Davis)William Yetman
Ancestry.com transitioned to using Tableau for self-service business intelligence after facing challenges with their traditional BI tool. They found that Tableau enabled faster discovery and sharing of insights across their organization. Within 9 months of adopting Tableau, they went from a team of 3 analysts to over 800 views and 250 workbooks being created by their 100 desktop license users. Ancestry.com has seen successes from their PR and A/B testing teams using Tableau, and their future plans include integrating Tableau with Hadoop for more data exploration across additional departments.
"Don’t worry about people stealing an idea. If it’s original, you will have to ram it down their throats.” Howard Aiken, Founder of Harvard’s Computing Science Program.
Data is moving so fast these days, and there is a shift whereby people are paying for value, not technology. This is where cloud computing comes in: it is very empowering, because anyone with an internet connection can access it. With Power BI in the cloud, small businesses are liberated with the ability to use the same tools and techniques to explore ideas as larger organisations.
In this session, we will look at understanding the Power BI components and tools available in the cloud, including the Power BI Admin Center, Power Query, Power Pivot, Power View and Power Map. We will look at how to use them will accelerate ideas and help to clarify decisions, and related to this, discuss the roles within IT and the business in relation to these tools. We will also look at business puzzles versus business mysteries, a definition evoked by Malcolm Gladwell (Blink, Outliers) in relation to Power BI.
“Out there in some garage is an entrepreneur who’s forging a bullet with your company’s name on it,” said Gary Hamel, a management guru. With Power BI, let’s see how you can translate your ideas in to a message that people can see, using cloud as an empowerment tool.
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...Dataiku
This document discusses the challenges faced by a data team manager named Hal in developing a data science software platform for his company. It describes Hal's background in technical fields like functional programming. It then outlines some of the disconnects Hal experienced in determining the appropriate technologies, hiring the right people, accessing needed data, and involving product teams. The document provides suggestions for how Hal can find solutions, such as taking a polyglot approach using open source technologies, creating an API culture, and focusing on solving big business problems to gain support.
What Does Big Data Really Mean for Your Business?All Things Open
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Leslie Hawthorn
Director of Developer Relations for Elasticsearch
Big Data
What Does Big Data Really Mean for Your Business?
The document discusses the role of humans in an era of big data and machine learning. It outlines that humans are needed to tag data to help machines understand it, and that crowdsourcing is one way to obtain tagged data at scale. The presentation also covers how the human-in-the-loop paradigm involves humans actively training machine learning models through techniques like active learning.
Sql rally amsterdam Aanalysing data with Power BI and HiveJen Stirrup
Analyzing Data with Power View (Level 100)
Jen Stirrup
Come learn about the best ways to present data to your Business Intelligence data consumers, and see how to apply these principles in Power View, Microsoft's data visualization tool. Using demos, we will investigate Power View based on current cognitive research around data visualization principles from such experts as Stephen Few, Edware Tufte, and others. We will then examine how data can be analyzed with Power View and look at where Power View is supplemented by other parts of the Microsoft Business Intelligence stack.
Viet-Trung Tran presents information on big data and cloud computing. The document discusses key concepts like what constitutes big data, popular big data management systems like Hadoop and NoSQL databases, and how cloud computing can enable big data processing by providing scalable infrastructure. Some benefits of running big data analytics on the cloud include cost reduction, rapid provisioning, and flexibility/scalability. However, big data may not always be suitable for the cloud due to issues like data security, latency requirements, and multi-tenancy overhead.
This document provides an overview of big data and how it can be used to forecast and predict outcomes. It discusses how large amounts of data are now being collected from various sources like the internet, sensors, and real-world transactions. This data is stored and processed using technologies like MapReduce, Hadoop, stream processing, and complex event processing to discover patterns, build models, and make predictions. Examples of current predictions include weather forecasts, traffic patterns, and targeted marketing recommendations. The document outlines challenges in big data like processing speed, security, and privacy, but argues that with the right techniques big data can help further human goals of understanding, explaining, and anticipating what will happen in the future.
This document provides an overview of big data, including what it is, how much data is generated every minute, the characteristics and challenges of big data, technologies used like Hadoop and MapReduce, how big data is stored, selected, and processed, applications in various industries, big data analytics, and benefits. It also discusses the future growth of big data and need for data scientists and analysts to support big data.
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
Choosing a data visualization tool is like being a barista serving coffee: everyone wants their data, their way, personalized, fast, and perfect. Many organizations have a cottage industry of data visualization tools, and it's difficult to know what tool to use, and when. Different tools exist in different departments, and if it doesn't meet the user requirements, the default position is to go back to Excel and move the data around there.
This session will examine data visualization tools such as SSRS Excel, Tableau, QlikView, Datazen, Kibana and PowerBI, in order to craft and blend your data visualization tools to serve your data customers better.
This document discusses big data, including its key components and trends. It defines big data using the four V's: volume, velocity, variety, and veracity. The evolution of computing technologies like storage, processors, networks, and data centers enabled the collection of large amounts of diverse data that is generated and needs to be analyzed quickly. Components of big data systems include data storage, processing, management, analytics, and visualization tools. Leaders in big data include Facebook, Amazon, Netflix, Google, and others. Emerging trends discussed are Hadoop becoming mainstream, growth of cloud applications, and the integration of IoT, cloud, and big data.
Big Data Analytics with Qlik & Splunk, Qlik QonnectionsGeralyn Maloney
This document discusses big data analytics using Qlik and Splunk. It provides an overview of Splunk, describing it as a tool that indexes and makes data searchable as long as it has a time stamp. It then discusses some strengths and weaknesses of Splunk, including its capabilities for large-scale data indexing and search but weaker interactive visualization. The document proposes integrating Qlik and Splunk by building a custom connector to stream data directly from Splunk into Qlik's in-memory model for improved interactive visualization, slicing and dicing of real-time data. Screenshots of a prototype Splunk-Qlik connector are provided.
This document discusses the transition from traditional business intelligence (BI) to big data. It notes that BI focuses on structured transactional data and answering questions about the past, while big data leverages both structured and unstructured behavioral data from diverse sources to answer questions about the future. The document outlines technologies like Hadoop, NoSQL databases, and cloud computing that enable organizations to capture and analyze large, dynamic datasets. It also discusses the roles of data scientists and new types of visualizations and devices that support deriving insights from big data.
Snowplow had our debut at the Data Science Festival in London this April. It was a good chance for us to engage with the data science community and learn more about the important work data scientists are doing and how Snowplow best can support this work. We definitely learned a lot and would like to thank everyone who made it by our booth for a chat.
Alex, Snowplow’s Co-Founder and CEO, held a lightning talk on machine learning in real-time. He is sharing a warning from the past and offer some suggestions and design constraints to not repeat the mistakes when it comes to building out your real-time ML capabilities.
This document provides an agenda for the CITA'15 Workshop held in August 2015. The workshop schedule includes 4 sessions taking place between 8:30 am and 5:00 pm with morning and afternoon breaks. The workshop agenda covers topics such as big data analytics, open data, semantic data description using ontologies and RDF, and a case study on converting a dataset to linked open data. The format of the workshop will be interactive with exercises and discussion encouraged.
This document discusses the rise of Hadoop and big data analytics skills needed for developers. It notes that Hadoop provides a scalable platform for distributed processing of all types of data in any format. It has become a universal data platform for enterprises. Developers now need skills in distributed systems, machine learning, and SQL-on-Hadoop tools. Both traditional data warehousing skills and new skills in Java, Scala, Python and distributed processing are important for software developers to have as big data becomes pervasive.
This document discusses big data, defining it as data that is too large and complex for traditional data processing systems due to its volume, variety and velocity. It outlines the 3Vs of big data - volume, referring to the large amount of data being generated daily; variety, referring to different data formats; and velocity, referring to the speed at which data is generated and needs to be processed. The document also discusses characteristics of big data like structured, semi-structured and unstructured data, benefits of big data, challenges of capturing, storing, analyzing and presenting big data, and technologies like Hadoop and MapReduce used for big data solutions.
Think Big - How to Design a Big Data Information ArchitectureInside Analysis
Exploratory Webcast for the Big Data Information Architecture Research Project
Live Webcast Jan. 22, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=32304b307fc5359a2f97b173166ea07b
Big Data is everywhere -- that's for sure. But the big question for today's savvy enterprise is where, exactly, should it fit within the Information Architecture? Making that decision correctly can save a lot of money while adding significant value to any number of enterprise operations. Business processes can be improved with critical new data sets; marketing can excel at hitting the right targets quickly; sales can hit home runs by having a much deeper understanding of key prospects; and senior executives can see the big picture more clearly than ever before.
Register for this Exploratory Webcast to hear veteran Analyst Dr. Robin Bloor outline the current landscape of Big Data, and offer guidance for today's organizations to determine how, when and where to deploy this powerful if unwieldy information asset. This event will kick off The Bloor Group's Interactive Research Report for 2014 which will focus on illuminating optimal Big Data Information Architectures. The series will include a dozen interviews with today's Big Data visionaries, plus three interactive Webcasts and a detailed findings report.
Visit InsideAnalysis.com for more information.
This document summarizes a presentation about Spring, Querydsl, and MongoDB. It introduces Spring and Spring Data frameworks, which make it easier to build Java applications and access data. It also describes Querydsl, a query building tool that works with Spring Data. The presentation demonstrates how to use Spring Data and Querydsl with MongoDB, a non-relational database, to build applications that can query and retrieve data from MongoDB in a type-safe way. Examples of building queries, entities, and repositories are provided.
This document provides an overview of Big Data training. It defines key concepts like volume, velocity, variety and veracity in Big Data. It discusses how Big Data is growing exponentially in terms of content, videos watched, and people online. It then introduces Hadoop, an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. Key components of Hadoop like HDFS and MapReduce are explained. The document concludes with a discussion of Hadoop distributions and demonstrations of Cloudera, Cassandra and MongoDB.
Data Driven: The Ancestry.com Journey to Self-Service AnalyticsWilliam Yetman
The document summarizes Ancestry.com's journey to self-service analytics using Tableau. It discusses the challenges with their traditional BI tool, how they evaluated Tableau and other options, and how adopting Tableau helped overcome reporting bottlenecks. Key successes with Tableau included a Mother's Day PR campaign that was their most talked about and successful campaign, and allowing their A/B testing team to complete 40 requests for analysis in 3 days using a Tableau dashboard. Their vision for the future includes expanding Tableau usage to additional departments and data sources.
Tableau Lunch and Learn in SLC on 6-10-2014 (Bill Yetman and Adam Davis)William Yetman
Ancestry.com transitioned to using Tableau for self-service business intelligence after facing challenges with their traditional BI tool. They found that Tableau enabled faster discovery and sharing of insights across their organization. Within 9 months of adopting Tableau, they went from a team of 3 analysts to over 800 views and 250 workbooks being created by their 100 desktop license users. Ancestry.com has seen successes from their PR and A/B testing teams using Tableau, and their future plans include integrating Tableau with Hadoop for more data exploration across additional departments.
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
LinkedIn is a large professional social network with 50 million users from around the world. It faces big data challenges at scale, such as caching a user's third degree network of up to 20 million connections and performing searches across 50 million user profiles. LinkedIn uses Hadoop and other scalable architectures like distributed search engines and custom graph engines to solve these problems. Hadoop provides a scalable framework to process massive amounts of user data across thousands of nodes through its MapReduce programming model and HDFS distributed file system.
No doubt Visualization of Data is a key component of our industry. The path data travels since it is created till it takes shape in a chart is sometimes obscure and overlooked as it tends to live in the engineering side (when volume is relevant), an area where Data Scientist tend to visit but not the usual Web/Marketing Data Analyst. Nowadays the options to tame all that journey and make the best of it are many and they don't require extensive engineering knowledge. Small or Big Data, let's see what "Store, Extract, Transform, Load, Visualize" is all about.
Ellucian Live 2014 Presentation on Reporting and BIKent Brooks
This document summarizes a presentation about seven Wyoming community colleges migrating to a single statewide reporting system. The key points are:
1) The colleges previously had challenges with consistency, timing and accuracy of aggregate reporting to state entities due to using separate systems, so they migrated to a single SQL platform and reporting system.
2) The multi-year project involved migrating all colleges to the SQL environment, implementing Business Objects for reporting, designing a standard data set, and setting up a system for the Commission Office to report on behalf of the colleges.
3) Lessons learned included starting data preparation early, redesigning processes, rigorous testing, and later implementing additional business intelligence tools for real-time ad hoc
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
This document discusses Elasticsearch and how it can be used to search, analyze, and make sense of large amounts of data. It provides examples of how Elasticsearch is being used by large companies to handle petabytes of data and gain insights. Implementations in France are highlighted. The document concludes by demonstrating how easily Elasticsearch can be deployed and used to ingest and search sample data.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
This document provides an overview of big data concepts and technologies. It discusses the growth of data, characteristics of big data including volume, variety and velocity. Popular big data technologies like Hadoop, MapReduce, HDFS, Pig and Hive are explained. NoSQL databases like Cassandra, HBase and MongoDB are introduced. The document also covers massively parallel processing databases and column-oriented databases like Vertica. Overall, the document aims to give the reader a high-level understanding of the big data landscape and popular associated technologies.
Slides used for the keynote at the even Big Data & Data Science http://eventos.citius.usc.es/bigdata/
Some slides are borrowed from random hadoop/big data presentations
Big Data brings big promise and also big challenges, the primary and most important one being the ability to deliver Value to business stakeholders who are not data scientists!
This document provides an introduction and overview of big data technologies. It begins with defining big data and its key characteristics of volume, variety and velocity. It discusses how data has exploded in recent years and examples of large scale data sources. It then covers popular big data tools and technologies like Hadoop and MapReduce. The document discusses how to get started with big data and learning related skills. Finally, it provides examples of big data projects and discusses the objectives and benefits of working with big data.
Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...Erika Roach
Presented by Nathan Fay and Erika Roach at 16NTC Conference: "Hear how employees at the Lucile Packard Foundation for Children’s Health at Stanford brought data analytics and Tableau to their organization. We will discuss approaches to creating cultural change with respect to new technology adoption: establishing a need, gaining influence and credibility, and demonstrating the value to organizational leaders.
Building on this framework of cultural change, we will also discuss how to scale up your analytic culture with best practices and how to create a roadmap for success."
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
- The document discusses automating data science pipelines with DevOps tools like Ansible, Packer, and Kubernetes.
- It covers obtaining data, exploring and modeling data, and how to automate infrastructure setup and deployment with tools like Packer to build machine images and Ansible for configuration management.
- The rise of DevOps and its cultural aspects are discussed as well as how tools like Packer, Ansible, Kubernetes can help automate infrastructure and deploy machine learning models at scale in production environments.
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
As organizations modernize their data and analytics platforms, the data lake concept has gained momentum as a shared enterprise resource for supporting insights across multiple lines of business. The perception is that data lakes are vast, slow-moving bodies of data, but innovations like Apache Kafka for streaming-first architectures put real-time data flows at the forefront. Combining real-time alerts and fast-moving data with rich historical analysis lets you respond quickly to changing business conditions with powerful data lake analytics to make smarter decisions.
Join this complimentary webinar with industry experts from 451 Research and Arcadia Data who will discuss:
- Business requirements for combining real-time streaming and ad hoc visual analytics.
- Innovations in real-time analytics using tools like Confluent’s KSQL.
- Machine-assisted visualization to guide business analysts to faster insights.
- Elevating user concurrency and analytic performance on data lakes.
- Applications in cybersecurity, regulatory compliance, and predictive maintenance on manufacturing equipment all benefit from streaming visualizations.
Department of Commerce App Challenge: Big Data DashboardsBrand Niemann
The document summarizes Dr. Brand Niemann's presentation at the 2012 International Open Government Data Conference. It discusses open data principles and provides an example using EPA data. It also describes Niemann's beautiful spreadsheet dashboard for EPA metadata and APIs. Finally, it outlines Niemann's data science analytics approach for the conference, including knowledge bases, data catalog, and using business intelligence tools to analyze linked open government data.
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
The document proposes an Agile Big Data model to address perceived issues with traditional Hadoop implementations. It discusses the motivation for change and outlines an Agile model with self-organized roles including data stewards, data scientists, project teams, and an architecture board. Key aspects of the proposed model include independent and self-managed project teams, a domain-driven data model, and emphasis on data quality and governance through the involvement of data stewards across domains.
This document summarizes a presentation about the graph database Neo4j. The presentation included an agenda that covered graphs and their power, how graphs change data views, and real-time recommendations with graphs. It introduced the presenters and discussed how data relationships unlock value. It described how Neo4j allows modeling data as a graph to unlock this value through relationship-based queries, evolution of applications, and high performance at scale. Examples showed how Neo4j outperforms relational and NoSQL databases when relationships are important. The presentation concluded with examples of how Neo4j customers have benefited.
Similar to Data Driven - The Ancestry Journey - 12-10-14 (20)
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
5. World’s largest online family history resource
5
Approx. 2.7 million paid subscribers across all family history sites
6. Data drives our business
6
• 14 billion digitized historical records
• 60 million family trees
• 6 billion profiles
• 200 million sharable photos, documents and written stories
• 10 petabytes of data
10. Traditional BI tool challenges
• Dashboard bottleneck
- Team of 3
- Analysts wouldn’t use it
- Steep learning curve
10
11. The search for a self-service tool
• Executive challenge to become a data
driven org
• Needed to move quicker with discovering
and sharing insights
11
12. Self-service options explored
Microstrategy Visual Insight
- Training
- Workshops
Microsoft Power BI POC
- Power Pivot
- Power View
Tableau Evaluation
- 2 week
- 30 desktop users 12
13. Tableau evaluation findings
• 2 weeks
• 120 views created
• Excel users were quickest adopters
• Prizes Awarded
- Most colorful
- Most viral
- Most put together
13
15. Adoption explodes
• In just over 1 year
• 100 desktop licenses
• 8 core CPU server license
(Access for Everyone)
• Over 1500 Views
• More than 450 Workbooks
• Went from struggling with BI tool user
adoption to everyone wants to use it.
15
16. 16
How do we avoid the
“Wild West” of reporting?
25. PR Mother’s Day campaign
• Featured in news articles
- Wall Street Journal
- Washington Post
- Time.com
- NY Daily News
• Featured as Viz of the Day
on Tableau Public
• Bullet one or paragraph heading
Paragraph 2 contains first bullet point
- Paragraph 3 contains secondary bullet
25
33. Vision for the future
• Hadoop & Hive
- Data exploration
# of views in 1 year: 1500+
• Adoption by additional departments in organization
- Find the “Excel Jockeys” with Big .XLS workbooks
- DNA Science Team
• Expand Functionality
- Metric monitoring
- Server tools
- Future Mobile 33
35. Key Takeaways
• Get Desktop in the hands of data driven individuals.
• Find a way to consolidate approved reporting.
• Start using Tableau Public.
• Get out of your own way and let Tableau work.
35
Discover, preserve, and share.
Interesting part is the technologies needed to manipulate the data to deliver on this mission statement.
We are the largest online family history resource. All the sites under the Ancestry.com umbrella. 2.7 M paid subscribers
Data is key. Eric Shoup, our Executive VP of Product says “Ancestry is a technology company that masquerades as a Family History company. One of the best kept secrets are the technology challenges we deal with. Global content from 67 countries. Records that date back to 1370. Constantly adding an average of 2 million records daily to the 14 billion on the site. Large amount of user contributed content.