Data Driven - The Ancestry Journey - 12-10-14

•Download as PPTX, PDF•

1 like•496 views

"Data Driven - The Ancestry Journey to self-service Analytics" December 10, 2014 Grand America Hotel Adam Davis - Data Visualization Lead

Technology

Data Driven: The Ancestry Journey to Self-
Service Analytics
Adam Davis – Data Visualization Lead
December 10, 2014 – Salt Lake City

Agenda
I. About Ancestry
II. Our story
III. Challenges & solutions
IV. Successes
V. Future opportunities

World’s largest online family history resource
5
Approx. 2.7 million paid subscribers across all family history sites

Data drives our business
6
• 14 billion digitized historical records
• 60 million family trees
• 6 billion profiles
• 200 million sharable photos, documents and written stories
• 10 petabytes of data

Our story
Tableau’s role in the Ancestry data strategy
9

Traditional BI tool challenges
• Dashboard bottleneck
- Team of 3
- Analysts wouldn’t use it
- Steep learning curve
10

The search for a self-service tool
• Executive challenge to become a data
driven org
• Needed to move quicker with discovering
and sharing insights
11

Self-service options explored
 Microstrategy Visual Insight
- Training
- Workshops
 Microsoft Power BI POC
- Power Pivot
- Power View
 Tableau Evaluation
- 2 week
- 30 desktop users 12

Tableau evaluation findings
• 2 weeks
• 120 views created
• Excel users were quickest adopters
• Prizes Awarded
- Most colorful
- Most viral
- Most put together
13

Self-Service Model
Challenges & Solutions

Adoption explodes
• In just over 1 year
• 100 desktop licenses
• 8 core CPU server license
 (Access for Everyone)
• Over 1500 Views
• More than 450 Workbooks
• Went from struggling with BI tool user
adoption to everyone wants to use it.
15

16
How do we avoid the
“Wild West” of reporting?

Governance vs. Self-Service
• Extracts
• Projects
• Portal
17

Extracts
• Kick off batch file in ETL
process using Tabcmd
• Update within an hour
of Data Warehouse
• Maintained by BI team
18

Projects
• Sandbox project for
all to use
• Projects for analysts
• Data Portal Project
19

Portal
• Consolidate Reporting
• Approved reports by FP&A, Analytics & BI
20

PR Mother’s Day campaign
• Featured in news articles
- Wall Street Journal
- Washington Post
- Time.com
- NY Daily News
• Featured as Viz of the Day
on Tableau Public
• Bullet one or paragraph heading
 Paragraph 2 contains first bullet point
- Paragraph 3 contains secondary bullet
25

“Our most talked about
and successful campaign.”
-Matt
27

Home Ownership
• Featured in news articles
- Washington Post
- Time.com
28

Vision for the future
• Hadoop & Hive
- Data exploration
# of views in 1 year: 1500+
• Adoption by additional departments in organization
- Find the “Excel Jockeys” with Big .XLS workbooks
- DNA Science Team
• Expand Functionality
- Metric monitoring
- Server tools
- Future Mobile 33

Key Takeaways
• Get Desktop in the hands of data driven individuals.
• Find a way to consolidate approved reporting.
• Start using Tableau Public.
• Get out of your own way and let Tableau work.
35

Thank you
Adam Davis, adam.davis@ancestry.com

The document discusses visualizing big data with tools like Hadoop, Hive, and Excel 2013. It provides an overview of big data technologies and data visualization with Office 365 and Power BI. It describes what Hive is and how it works, including how Hive solves the problem of analyzing large amounts of data by providing a SQL-like language (HiveQL) to query data stored in Hadoop and translating queries to MapReduce jobs. The document demonstrates visualizing big data with Microsoft tools like Power View and Power Map in Excel.

Webinar: The 5 Most Critical Things to Understand About Modern Data Integration

SnapLogic

In this webinar, we talk to industry analyst, author and practitioner David Linthicum who provides a state-of-the-technology explanation of big data integration. David also provides 5 critical and lesser known data integration requirements, how to understand today's requirements, and guidance for choosing the right approaches and technology to solve these problems. To learn more, visit: www.snaplogic.com/big-data

Applied Data Science Course Part 1: Concepts & your first ML model

Dataiku

Data-Driven - Open Data Use and Reuse

Troy James Palanca

This document summarizes a presentation on open data use and reuse. The presentation discusses the speaker's experience working with open data, including analyzing customs data and creating dashboards. It emphasizes that data can provide insights beyond individual stories by looking at broader trends. The speaker advocates improving data literacy and collection to promote more data-driven decision making and participatory governance. The goal is to get more people engaged in open data through hands-on projects and making the work fun and approachable.

"Don’t worry about people stealing an idea. If it’s original, you will have to ram it down their throats.” Howard Aiken, Founder of Harvard’s Computing Science Program. Data is moving so fast these days, and there is a shift whereby people are paying for value, not technology. This is where cloud computing comes in: it is very empowering, because anyone with an internet connection can access it. With Power BI in the cloud, small businesses are liberated with the ability to use the same tools and techniques to explore ideas as larger organisations. In this session, we will look at understanding the Power BI components and tools available in the cloud, including the Power BI Admin Center, Power Query, Power Pivot, Power View and Power Map. We will look at how to use them will accelerate ideas and help to clarify decisions, and related to this, discuss the roles within IT and the business in relation to these tools. We will also look at business puzzles versus business mysteries, a definition evoked by Malcolm Gladwell (Blink, Outliers) in relation to Power BI. “Out there in some garage is an entrepreneur who’s forging a bullet with your company’s name on it,” said Gary Hamel, a management guru. With Power BI, let’s see how you can translate your ideas in to a message that people can see, using cloud as an empowerment tool.

Dataiku - data driven nyc - april 2016 - the solitude of the data team m...

Dataiku

This document discusses the challenges faced by a data team manager named Hal in developing a data science software platform for his company. It describes Hal's background in technical fields like functional programming. It then outlines some of the disconnects Hal experienced in determining the appropriate technologies, hiring the right people, accessing needed data, and involving product teams. The document provides suggestions for how Hal can find solutions, such as taking a polyglot approach using open source technologies, creating an API culture, and focusing on solving big business problems to gain support.

What Does Big Data Really Mean for Your Business?

All Things Open

Big Data Visualisation with Hadoop and PowerPivot

Jen Stirrup

Walmart Big Data Expo

BigDataExpo

Sql rally amsterdam Aanalysing data with Power BI and Hive

Jen Stirrup

Analyzing Data with Power View (Level 100) Jen Stirrup Come learn about the best ways to present data to your Business Intelligence data consumers, and see how to apply these principles in Power View, Microsoft's data visualization tool. Using demos, we will investigate Power View based on current cognitive research around data visualization principles from such experts as Stephen Few, Edware Tufte, and others. We will then examine how data can be analyzed with Power View and look at where Power View is supplemented by other parts of the Microsoft Business Intelligence stack.

Overview of big data in cloud computing

Viet-Trung TRAN

Viet-Trung Tran presents information on big data and cloud computing. The document discusses key concepts like what constitutes big data, popular big data management systems like Hadoop and NoSQL databases, and how cloud computing can enable big data processing by providing scalable infrastructure. Some benefits of running big data analytics on the cloud include cost reduction, rapid provisioning, and flexibility/scalability. However, big data may not always be suitable for the cloud due to issues like data security, latency requirements, and multi-tenancy overhead.

Introduction to Big Data

Srinath Perera

This document provides an overview of big data and how it can be used to forecast and predict outcomes. It discusses how large amounts of data are now being collected from various sources like the internet, sensors, and real-world transactions. This data is stored and processed using technologies like MapReduce, Hadoop, stream processing, and complex event processing to discover patterns, build models, and make predictions. Examples of current predictions include weather forecasts, traffic patterns, and targeted marketing recommendations. The document outlines challenges in big data like processing speed, security, and privacy, but argues that with the right techniques big data can help further human goals of understanding, explaining, and anticipating what will happen in the future.

Big data

nikki135

This document provides an overview of big data, including what it is, how much data is generated every minute, the characteristics and challenges of big data, technologies used like Hadoop and MapReduce, how big data is stored, selected, and processed, applications in various industries, big data analytics, and benefits. It also discusses the future growth of big data and need for data scientists and analysts to support big data.

Business Intelligence Barista: What DataViz Tool to Use, and When?

Jen Stirrup

Choosing a data visualization tool is like being a barista serving coffee: everyone wants their data, their way, personalized, fast, and perfect. Many organizations have a cottage industry of data visualization tools, and it's difficult to know what tool to use, and when. Different tools exist in different departments, and if it doesn't meet the user requirements, the default position is to go back to Excel and move the data around there. This session will examine data visualization tools such as SSRS Excel, Tableau, QlikView, Datazen, Kibana and PowerBI, in order to craft and blend your data visualization tools to serve your data customers better.

Big data and beyond

Sastry N Penumarthy

This document discusses big data, including its key components and trends. It defines big data using the four V's: volume, velocity, variety, and veracity. The evolution of computing technologies like storage, processors, networks, and data centers enabled the collection of large amounts of diverse data that is generated and needs to be analyzed quickly. Components of big data systems include data storage, processing, management, analytics, and visualization tools. Leaders in big data include Facebook, Amazon, Netflix, Google, and others. Emerging trends discussed are Hadoop becoming mainstream, growth of cloud applications, and the integration of IoT, cloud, and big data.

Big Data Analytics with Qlik & Splunk, Qlik Qonnections

Geralyn Maloney

This document discusses big data analytics using Qlik and Splunk. It provides an overview of Splunk, describing it as a tool that indexes and makes data searchable as long as it has a time stamp. It then discusses some strengths and weaknesses of Splunk, including its capabilities for large-scale data indexing and search but weaker interactive visualization. The document proposes integrating Qlik and Splunk by building a custom connector to stream data directly from Splunk into Qlik's in-memory model for improved interactive visualization, slicing and dicing of real-time data. Screenshots of a prototype Splunk-Qlik connector are provided.

Strata Online_road_to_enterprise_data_2011

Lynn Langit

This document discusses the transition from traditional business intelligence (BI) to big data. It notes that BI focuses on structured transactional data and answering questions about the past, while big data leverages both structured and unstructured behavioral data from diverse sources to answer questions about the future. The document outlines technologies like Hadoop, NoSQL databases, and cloud computing that enable organizations to capture and analyze large, dynamic datasets. It also discusses the roles of data scientists and new types of visualizations and devices that support deriving insights from big data.

Machine learning in real-time - the next frontier

Snowplow Analytics

Snowplow had our debut at the Data Science Festival in London this April. It was a good chance for us to engage with the data science community and learn more about the important work data scientists are doing and how Snowplow best can support this work. We definitely learned a lot and would like to thank everyone who made it by our booth for a chat. Alex, Snowplow’s Co-Founder and CEO, held a lightning talk on machine learning in real-time. He is sharing a warning from the past and offer some suggestions and design constraints to not repeat the mistakes when it comes to building out your real-time ML capabilities.

Workshop_CITA2015

Bebo White

This document provides an agenda for the CITA'15 Workshop held in August 2015. The workshop schedule includes 4 sessions taking place between 8:30 am and 5:00 pm with morning and afternoon breaks. The workshop agenda covers topics such as big data analytics, open data, semantic data description using ontologies and RDF, and a case study on converting a dataset to linked open data. The format of the workshop will be interactive with exercises and discussion encouraged.

Bi 2.0 hadoop everywhere

Dmitry Tolpeko

This document discusses the rise of Hadoop and big data analytics skills needed for developers. It notes that Hadoop provides a scalable platform for distributed processing of all types of data in any format. It has become a universal data platform for enterprises. Developers now need skills in distributed systems, machine learning, and SQL-on-Hadoop tools. Both traditional data warehousing skills and new skills in Java, Scala, Python and distributed processing are important for software developers to have as big data becomes pervasive.

Puja(801),sanghamitra(819),surabhi(844)

puja singh

This document discusses big data, defining it as data that is too large and complex for traditional data processing systems due to its volume, variety and velocity. It outlines the 3Vs of big data - volume, referring to the large amount of data being generated daily; variety, referring to different data formats; and velocity, referring to the speed at which data is generated and needs to be processed. The document also discusses characteristics of big data like structured, semi-structured and unstructured data, benefits of big data, challenges of capturing, storing, analyzing and presenting big data, and technologies like Hadoop and MapReduce used for big data solutions.

Think Big - How to Design a Big Data Information Architecture

Inside Analysis

Exploratory Webcast for the Big Data Information Architecture Research Project Live Webcast Jan. 22, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=32304b307fc5359a2f97b173166ea07b Big Data is everywhere -- that's for sure. But the big question for today's savvy enterprise is where, exactly, should it fit within the Information Architecture? Making that decision correctly can save a lot of money while adding significant value to any number of enterprise operations. Business processes can be improved with critical new data sets; marketing can excel at hitting the right targets quickly; sales can hit home runs by having a much deeper understanding of key prospects; and senior executives can see the big picture more clearly than ever before. Register for this Exploratory Webcast to hear veteran Analyst Dr. Robin Bloor outline the current landscape of Big Data, and offer guidance for today's organizations to determine how, when and where to deploy this powerful if unwieldy information asset. This event will kick off The Bloor Group's Interactive Research Report for 2014 which will focus on illuminating optimal Big Data Information Architectures. The series will include a dozen interviews with today's Big Data visionaries, plus three interactive Webcasts and a detailed findings report. Visit InsideAnalysis.com for more information.

Spring + QueryDSL + MongoDB Presentation

Ranga Vadlamudi

This document summarizes a presentation about Spring, Querydsl, and MongoDB. It introduces Spring and Spring Data frameworks, which make it easier to build Java applications and access data. It also describes Querydsl, a query building tool that works with Spring Data. The presentation demonstrates how to use Spring Data and Querydsl with MongoDB, a non-relational database, to build applications that can query and retrieve data from MongoDB in a type-safe way. Examples of building queries, entities, and repositories are provided.

Big datatraining ranga_1

Ranga Vadlamudi

This document provides an overview of Big Data training. It defines key concepts like volume, velocity, variety and veracity in Big Data. It discusses how Big Data is growing exponentially in terms of content, videos watched, and people online. It then introduces Hadoop, an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. Key components of Hadoop like HDFS and MapReduce are explained. The document concludes with a discussion of Hadoop distributions and demonstrations of Cloudera, Cassandra and MongoDB.

Data Driven: The Ancestry.com Journey to Self-Service Analytics

William Yetman

The document summarizes Ancestry.com's journey to self-service analytics using Tableau. It discusses the challenges with their traditional BI tool, how they evaluated Tableau and other options, and how adopting Tableau helped overcome reporting bottlenecks. Key successes with Tableau included a Mother's Day PR campaign that was their most talked about and successful campaign, and allowing their A/B testing team to complete 40 requests for analysis in 3 days using a Tableau dashboard. Their vision for the future includes expanding Tableau usage to additional departments and data sources.

Tableau Lunch and Learn in SLC on 6-10-2014 (Bill Yetman and Adam Davis)

William Yetman

Ancestry.com transitioned to using Tableau for self-service business intelligence after facing challenges with their traditional BI tool. They found that Tableau enabled faster discovery and sharing of insights across their organization. Within 9 months of adopting Tableau, they went from a team of 3 analysts to over 800 views and 250 workbooks being created by their 100 desktop license users. Ancestry.com has seen successes from their PR and A/B testing teams using Tableau, and their future plans include integrating Tableau with Hadoop for more data exploration across additional departments.

What's hot

democratization of data sql-konferenz

Jen Stirrup

Dataiku - data driven nyc - april 2016 - the solitude of the data team m...

Dataiku

What Does Big Data Really Mean for Your Business?

All Things Open

Big Data Visualisation with Hadoop and PowerPivot

Jen Stirrup

Walmart Big Data Expo

BigDataExpo

Sql rally amsterdam Aanalysing data with Power BI and Hive

Jen Stirrup

Overview of big data in cloud computing

Viet-Trung TRAN

Introduction to Big Data

Srinath Perera

Big data

nikki135

Business Intelligence Barista: What DataViz Tool to Use, and When?

Jen Stirrup

Big data and beyond

Sastry N Penumarthy

Big Data Analytics with Qlik & Splunk, Qlik Qonnections

Geralyn Maloney

Strata Online_road_to_enterprise_data_2011

Lynn Langit

Machine learning in real-time - the next frontier

Snowplow Analytics

Workshop_CITA2015

Bebo White

Bi 2.0 hadoop everywhere

Dmitry Tolpeko

Puja(801),sanghamitra(819),surabhi(844)

puja singh

Think Big - How to Design a Big Data Information Architecture

Inside Analysis

Spring + QueryDSL + MongoDB Presentation

Ranga Vadlamudi

Big datatraining ranga_1

Ranga Vadlamudi

What's hot (20)

democratization of data sql-konferenz

Dataiku - data driven nyc - april 2016 - the solitude of the data team m...

What Does Big Data Really Mean for Your Business?

Big Data Visualisation with Hadoop and PowerPivot

Walmart Big Data Expo

Sql rally amsterdam Aanalysing data with Power BI and Hive

Overview of big data in cloud computing

Introduction to Big Data

Big data

Business Intelligence Barista: What DataViz Tool to Use, and When?

Big data and beyond

Big Data Analytics with Qlik & Splunk, Qlik Qonnections

Strata Online_road_to_enterprise_data_2011

Machine learning in real-time - the next frontier

Workshop_CITA2015

Bi 2.0 hadoop everywhere

Puja(801),sanghamitra(819),surabhi(844)

Think Big - How to Design a Big Data Information Architecture

Spring + QueryDSL + MongoDB Presentation

Big datatraining ranga_1

Similar to Data Driven - The Ancestry Journey - 12-10-14

Data Driven: The Ancestry.com Journey to Self-Service Analytics

William Yetman

Tableau Lunch and Learn in SLC on 6-10-2014 (Bill Yetman and Adam Davis)

William Yetman

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...

datacite

Large scale computing

Bhupesh Bansal

LinkedIn is a large professional social network with 50 million users from around the world. It faces big data challenges at scale, such as caching a user's third degree network of up to 20 million connections and performing searches across 50 million user profiles. LinkedIn uses Hadoop and other scalable architectures like distributed search engines and custom graph engines to solve these problems. Hadoop provides a scalable framework to process massive amounts of user data across thousands of nodes through its MapReduce programming model and HDFS distributed file system.

Store, Extract, Transform, Load, Visualize. Untagged Conference

Ani Lopez

No doubt Visualization of Data is a key component of our industry. The path data travels since it is created till it takes shape in a chart is sometimes obscure and overlooked as it tends to live in the engineering side (when volume is relevant), an area where Data Scientist tend to visit but not the usual Web/Marketing Data Analyst. Nowadays the options to tame all that journey and make the best of it are many and they don't require extensive engineering knowledge. Small or Big Data, let's see what "Store, Extract, Transform, Load, Visualize" is all about.

Ellucian Live 2014 Presentation on Reporting and BI

Kent Brooks

This document summarizes a presentation about seven Wyoming community colleges migrating to a single statewide reporting system. The key points are: 1) The colleges previously had challenges with consistency, timing and accuracy of aggregate reporting to state entities due to using separate systems, so they migrated to a single SQL platform and reporting system. 2) The multi-year project involved migrating all colleges to the SQL environment, implementing Business Objects for reporting, designing a standard data set, and setting up a system for the Commission Office to report on behalf of the colleges. 3) Lessons learned included starting data preparation early, redesigning processes, rigorous testing, and later implementing additional business intelligence tools for real-time ad hoc

POWRR Tools: Lessons learned from an IMLS National Leadership Grant

Lynne Thomas

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

ALTER WAY

This document discusses Elasticsearch and how it can be used to search, analyze, and make sense of large amounts of data. It provides examples of how Elasticsearch is being used by large companies to handle petabytes of data and gain insights. Implementations in France are highlighted. The document concludes by demonstrating how easily Elasticsearch can be deployed and used to ingest and search sample data.

Semantics and Machine Learning

Vladimir Alexiev, PhD, PMP

"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019. It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.

Big data

roysonli

This document provides an overview of big data concepts and technologies. It discusses the growth of data, characteristics of big data including volume, variety and velocity. Popular big data technologies like Hadoop, MapReduce, HDFS, Pig and Hive are explained. NoSQL databases like Cassandra, HBase and MongoDB are introduced. The document also covers massively parallel processing databases and column-oriented databases like Vertica. Overall, the document aims to give the reader a high-level understanding of the big data landscape and popular associated technologies.

Introduction to Big Data

Roi Blanco

Big data4businessusers

Bob Hardaway

Big-Data-Seminar-6-Aug-2014-Koenig

Manish Chopra

This document provides an introduction and overview of big data technologies. It begins with defining big data and its key characteristics of volume, variety and velocity. It discusses how data has exploded in recent years and examples of large scale data sources. It then covers popular big data tools and technologies like Hadoop and MapReduce. The document discusses how to get started with big data and learning related skills. Finally, it provides examples of big data projects and discusses the objectives and benefits of working with big data.

Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...

Erika Roach

Presented by Nathan Fay and Erika Roach at 16NTC Conference: "Hear how employees at the Lucile Packard Foundation for Children’s Health at Stanford brought data analytics and Tableau to their organization. We will discuss approaches to creating cultural change with respect to new technology adoption: establishing a need, gaining influence and credibility, and demonstrating the value to organizational leaders. Building on this framework of cultural change, we will also discuss how to scale up your analytic culture with best practices and how to create a roadmap for success."

DMPTool for UMass eScience Symposium

Carly Strasser

DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...

Mihai Criveti

- The document discusses automating data science pipelines with DevOps tools like Ansible, Packer, and Kubernetes. - It covers obtaining data, exploring and modeling data, and how to automate infrastructure setup and deployment with tools like Packer to build machine images and Ansible for configuration management. - The rise of DevOps and its cultural aspects are discussed as well as how tools like Packer, Ansible, Kubernetes can help automate infrastructure and deploy machine learning models at scale in production environments.

Accelerating Data Lakes and Streams with Real-time Analytics

Arcadia Data

As organizations modernize their data and analytics platforms, the data lake concept has gained momentum as a shared enterprise resource for supporting insights across multiple lines of business. The perception is that data lakes are vast, slow-moving bodies of data, but innovations like Apache Kafka for streaming-first architectures put real-time data flows at the forefront. Combining real-time alerts and fast-moving data with rich historical analysis lets you respond quickly to changing business conditions with powerful data lake analytics to make smarter decisions. Join this complimentary webinar with industry experts from 451 Research and Arcadia Data who will discuss: - Business requirements for combining real-time streaming and ad hoc visual analytics. - Innovations in real-time analytics using tools like Confluent’s KSQL. - Machine-assisted visualization to guide business analysts to faster insights. - Elevating user concurrency and analytic performance on data lakes. - Applications in cybersecurity, regulatory compliance, and predictive maintenance on manufacturing equipment all benefit from streaming visualizations.

Department of Commerce App Challenge: Big Data Dashboards

Brand Niemann

The document summarizes Dr. Brand Niemann's presentation at the 2012 International Open Government Data Conference. It discusses open data principles and provides an example using EPA data. It also describes Niemann's beautiful spreadsheet dashboard for EPA metadata and APIs. Finally, it outlines Niemann's data science analytics approach for the conference, including knowledge bases, data catalog, and using business intelligence tools to analyze linked open government data.

Hadoop meets Agile! - An Agile Big Data Model

Uwe Printz

The document proposes an Agile Big Data model to address perceived issues with traditional Hadoop implementations. It discusses the motivation for change and outlines an Agile model with self-organized roles including data stewards, data scientists, project teams, and an architecture board. Key aspects of the proposed model include independent and self-managed project teams, a domain-driven data model, and emphasis on data quality and governance through the involvement of data stewards across domains.

Introducing Neo4j

Neo4j

This document summarizes a presentation about the graph database Neo4j. The presentation included an agenda that covered graphs and their power, how graphs change data views, and real-time recommendations with graphs. It introduced the presenters and discussed how data relationships unlock value. It described how Neo4j allows modeling data as a graph to unlock this value through relationship-based queries, evolution of applications, and high performance at scale. Examples showed how Neo4j outperforms relational and NoSQL databases when relationships are important. The presentation concluded with examples of how Neo4j customers have benefited.

Similar to Data Driven - The Ancestry Journey - 12-10-14 (20)

Data Driven: The Ancestry.com Journey to Self-Service Analytics

Tableau Lunch and Learn in SLC on 6-10-2014 (Bill Yetman and Adam Davis)

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...

Large scale computing

Store, Extract, Transform, Load, Visualize. Untagged Conference

Ellucian Live 2014 Presentation on Reporting and BI

POWRR Tools: Lessons learned from an IMLS National Leadership Grant

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

Semantics and Machine Learning

Big data

Introduction to Big Data

Big data4businessusers

Big-Data-Seminar-6-Aug-2014-Koenig

Forging Cultural Change: Transforming Your Organization Into a Data-Driven Ma...

DMPTool for UMass eScience Symposium

DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...

Accelerating Data Lakes and Streams with Real-time Analytics

Department of Commerce App Challenge: Big Data Dashboards

Hadoop meets Agile! - An Agile Big Data Model

Introducing Neo4j

Recently uploaded

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph

Neo4j

Generating privacy-protected synthetic data using Secludy and Milvus

Zilliz

During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/ DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen! Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell. Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten. Diese Themen werden behandelt - Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten - Wie funktionieren CCB- und CCX-Lizenzen wirklich? - Verstehen des DLAU-Tools und wie man es am besten nutzt - Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw. - Praxisbeispiele und Best Practices zum sofortigen Umsetzen

Columbus Data & Analytics Wednesdays - June 2024

Jason Packer

Main news related to the CCS TSI 2023 (2023/1695)

Jakub Marek

An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers. The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 . The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .

Energy Efficient Video Encoding for Cloud and Edge Computing Instances

Alpen-Adria-Universität

Programming Foundation Models with DSPy - Meetup Slides

Zilliz

HCL Notes and Domino License Cost Reduction in the World of DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/ The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this! We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model. Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward. These topics will be covered - Reducing license cost by finding and fixing misconfigurations and superfluous accounts - How do CCB and CCX licenses really work? - Understanding the DLAU tool and how to best utilize it - Tips for common problem areas, like team mailboxes, functional/test users, etc - Practical examples and best practices to implement right away

Principle of conventional tomography-Bibash Shahi ppt..pptx

BibashShahi

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...

Alex Pruden

Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security. Paper Link: https://eprint.iacr.org/2024/257

Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe

Precisely

Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market. Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.

JavaLand 2024: Application Development Green Masterplan

Miro Wengner

Northern Engraving | Nameplate Manufacturing Process - 2024

Northern Engraving

Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

Chart Kalyan

Leveraging the Graph for Clinical Trials and Standards

Neo4j

Astute Business Solutions | Oracle Cloud Partner |

AstuteBusiness

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

saastr

What is an RPA CoE? Session 1 – CoE Vision

DianaGray10

Mutation Testing for Task-Oriented Chatbots

Pablo Gómez Abajo

Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots. To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.

Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...

Pitangent Analytics & Technology Solutions Pvt. Ltd

Recently uploaded (20)

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph

Generating privacy-protected synthetic data using Secludy and Milvus

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

Columbus Data & Analytics Wednesdays - June 2024

Main news related to the CCS TSI 2023 (2023/1695)

Energy Efficient Video Encoding for Cloud and Edge Computing Instances

Programming Foundation Models with DSPy - Meetup Slides

HCL Notes and Domino License Cost Reduction in the World of DLAU

Principle of conventional tomography-Bibash Shahi ppt..pptx

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...

Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe

JavaLand 2024: Application Development Green Masterplan

Northern Engraving | Nameplate Manufacturing Process - 2024

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

Leveraging the Graph for Clinical Trials and Standards

Astute Business Solutions | Oracle Cloud Partner |

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

What is an RPA CoE? Session 1 – CoE Vision

Mutation Testing for Task-Oriented Chatbots

Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...

Data Driven - The Ancestry Journey - 12-10-14

1. Data Driven: The Ancestry Journey to Self- Service Analytics Adam Davis – Data Visualization Lead December 10, 2014 – Salt Lake City

2. Agenda I. About Ancestry II. Our story III. Challenges & solutions IV. Successes V. Future opportunities

3. Ancestry Who we are 3

4. 4

5. World’s largest online family history resource 5 Approx. 2.7 million paid subscribers across all family history sites

6. Data drives our business 6 • 14 billion digitized historical records • 60 million family trees • 6 billion profiles • 200 million sharable photos, documents and written stories • 10 petabytes of data

8. 8

9. Our story Tableau’s role in the Ancestry data strategy 9

10. Traditional BI tool challenges • Dashboard bottleneck - Team of 3 - Analysts wouldn’t use it - Steep learning curve 10

11. The search for a self-service tool • Executive challenge to become a data driven org • Needed to move quicker with discovering and sharing insights 11

12. Self-service options explored  Microstrategy Visual Insight - Training - Workshops  Microsoft Power BI POC - Power Pivot - Power View  Tableau Evaluation - 2 week - 30 desktop users 12

13. Tableau evaluation findings • 2 weeks • 120 views created • Excel users were quickest adopters • Prizes Awarded - Most colorful - Most viral - Most put together 13

14. Self-Service Model Challenges & Solutions

15. Adoption explodes • In just over 1 year • 100 desktop licenses • 8 core CPU server license  (Access for Everyone) • Over 1500 Views • More than 450 Workbooks • Went from struggling with BI tool user adoption to everyone wants to use it. 15

16. 16 How do we avoid the “Wild West” of reporting?

17. Governance vs. Self-Service • Extracts • Projects • Portal 17

18. Extracts • Kick off batch file in ETL process using Tabcmd • Update within an hour of Data Warehouse • Maintained by BI team 18

19. Projects • Sandbox project for all to use • Projects for analysts • Data Portal Project 19

20. Portal • Consolidate Reporting • Approved reports by FP&A, Analytics & BI 20

21. Looking for a solution 21

22. Solution 22 http://getbootstrap.com/

23. DEMO: Approved reports portal 23

24. Tableau Public Success with public data

25. PR Mother’s Day campaign • Featured in news articles - Wall Street Journal - Washington Post - Time.com - NY Daily News • Featured as Viz of the Day on Tableau Public • Bullet one or paragraph heading  Paragraph 2 contains first bullet point - Paragraph 3 contains secondary bullet 25

26. PR Mother’s Day campaign 26

27. “Our most talked about and successful campaign.” -Matt 27

28. Home Ownership • Featured in news articles - Washington Post - Time.com 28

29. 29

30. 30

31. Back for More 31

32. Future opportunities What’s next 32

33. Vision for the future • Hadoop & Hive - Data exploration # of views in 1 year: 1500+ • Adoption by additional departments in organization - Find the “Excel Jockeys” with Big .XLS workbooks - DNA Science Team • Expand Functionality - Metric monitoring - Server tools - Future Mobile 33

34. Final Thoughts Key Takeaways 34

35. Key Takeaways • Get Desktop in the hands of data driven individuals. • Find a way to consolidate approved reporting. • Start using Tableau Public. • Get out of your own way and let Tableau work. 35

36. Thank you Adam Davis, adam.davis@ancestry.com

Editor's Notes

Discover, preserve, and share. Interesting part is the technologies needed to manipulate the data to deliver on this mission statement.
We are the largest online family history resource. All the sites under the Ancestry.com umbrella. 2.7 M paid subscribers
Data is key. Eric Shoup, our Executive VP of Product says “Ancestry is a technology company that masquerades as a Family History company. One of the best kept secrets are the technology challenges we deal with. Global content from 67 countries. Records that date back to 1370. Constantly adding an average of 2 million records daily to the 14 billion on the site. Large amount of user contributed content.

Data Driven - The Ancestry Journey - 12-10-14

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Driven - The Ancestry Journey - 12-10-14

Similar to Data Driven - The Ancestry Journey - 12-10-14 (20)

Recently uploaded

Recently uploaded (20)

Data Driven - The Ancestry Journey - 12-10-14

Editor's Notes