The document provides an overview of presentations and discussions from the Strata Conference + Hadoop World 2013. Key themes included:
- The importance of understanding business needs and asking the right questions of data
- Choosing the right tools for each problem, not just popular ones, from the big data ecosystem
- Moving beyond just managing large volumes of data to delivering actionable insights
- The value of prototyping and experimentation in building support for new techniques
Henry Ford once said, "The only real mistake is the one from which we learn nothing." So how can we learn the most from system failures? In this session, we move beyond "blameless" postmortems to show how we can use data to mitigate future failures. We share best practices for gathering systems-related and people-related data, and we discuss how to use that data to formulate actionable response plans and avoid repeating failures.
Best practices in building machine learning models in Azure MLZeydy Ortiz, Ph. D.
Microsoft Azure ML Studio provides an easy-to-use interface to build and deploy machine learning models. However, the user must carefully select and configure the modules in order to derive meaningful results. In this presentation, I discuss a case study to highlight best practices in building machine learning models.
In the age of information overload, having a social media measurement practice is the key to successful execution of your social strategy. In this session, Debra Askanase looked at what data points tell you that your community cares and is willing to take action, a methodology to figuring what data is relevant to your outcomes, where to find the metrics that matter, and why setting up the right metrics can make the difference between knowing that people visited a page on your website, and if your social media actions sent them there.
My presentation on Data Mining, Lessons from Competitions, and Public Data looks at the Data Mining/Data Science/Big Data evolution, reviews lessons from KDD Cup 1997, Netflix Prize, and Kaggle, presents a big list of Public and Government data APIs, Marketplaces, Portals, and Platforms, and examines Big Data Hype. This talk was given at BPDM-2013, (Broadening Participation in Data Mining), Aug 10, 2013 held at KDD-2013, Chicago.
An intro into AI and how business leaders should use itLutz Finger
FOUR RULES: (1) AI is a tool not a business model. (2) Protect your data but use federated learning to share models. (3) Regulation should guide you & not stop you - use tools like LIME. (4) Think holistically and build a data culture of fair data usage.
Henry Ford once said, "The only real mistake is the one from which we learn nothing." So how can we learn the most from system failures? In this session, we move beyond "blameless" postmortems to show how we can use data to mitigate future failures. We share best practices for gathering systems-related and people-related data, and we discuss how to use that data to formulate actionable response plans and avoid repeating failures.
Best practices in building machine learning models in Azure MLZeydy Ortiz, Ph. D.
Microsoft Azure ML Studio provides an easy-to-use interface to build and deploy machine learning models. However, the user must carefully select and configure the modules in order to derive meaningful results. In this presentation, I discuss a case study to highlight best practices in building machine learning models.
In the age of information overload, having a social media measurement practice is the key to successful execution of your social strategy. In this session, Debra Askanase looked at what data points tell you that your community cares and is willing to take action, a methodology to figuring what data is relevant to your outcomes, where to find the metrics that matter, and why setting up the right metrics can make the difference between knowing that people visited a page on your website, and if your social media actions sent them there.
My presentation on Data Mining, Lessons from Competitions, and Public Data looks at the Data Mining/Data Science/Big Data evolution, reviews lessons from KDD Cup 1997, Netflix Prize, and Kaggle, presents a big list of Public and Government data APIs, Marketplaces, Portals, and Platforms, and examines Big Data Hype. This talk was given at BPDM-2013, (Broadening Participation in Data Mining), Aug 10, 2013 held at KDD-2013, Chicago.
An intro into AI and how business leaders should use itLutz Finger
FOUR RULES: (1) AI is a tool not a business model. (2) Protect your data but use federated learning to share models. (3) Regulation should guide you & not stop you - use tools like LIME. (4) Think holistically and build a data culture of fair data usage.
Ofer Ron, senior data scientist at LivePerson.
Recently, I've had the pleasure of presenting an introduction to Data Science and data driven products at DevconTLV
I focused this talk around the basic ideas of data science, not the technology used, since I thought that far too many times companies and developers rush to play around with "big data" related technologies, instead of figuring out what questions they want to answer, and whether these answers form a successful product.
Data Science Popup Austin: Conflict in Growing Data Science Organizations Domino Data Lab
Watch talk ➟ http://bit.ly/1NKPpQh
Eduardo Arino De La Rubia, VP of Product and Data Scientist in residence at Domino Data Lab talks about how to manage conflict in growing data science teams.
Ordinary people included anyone who is not a Geek like myself. This book is written for ordinary people. That includes manager, marketers, technical writers, couch potatoes and so on.
Data Science and Analytics for Ordinary People is a collection of blogs I have written on LinkedIn over the past year. As I continue to perform big data analytics, I continue to discover, not only my weaknesses in communicating the information, but new insights into using the information obtained from analytics and communicating it. These are the kinds of things I blog about and are contained herein.
What's the Value of Data Science for Organizations: Tips for Invincibility in...Ganes Kesari
This session was delivered as an Open Colloquium on Apr 30th 2020 for the Master in Information program students. It was organized by the Rutgers School of Communication & Information.
The session covers 3 themes:
- How do enterprises and not-for-profit organizations gain value from data science?
- What are the biggest challenges in data science that professionals are unaware of? How can students translate that into learnings, to make themselves indispensable in the industry
- What's the impact of COVID-19 and the recession on data science industry? How will the data jobs be impacted?
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
Using big data and implementing hadoop is a trend that people jump all to quickly to. Instead understanding the run time complexity of one's algorithms, reducing said complexity and managing the process from start to finish in a lean and agile way can yield massive cost savings - or save your organization.
TOC 2011: Content as Application, presented by Scott GrilloSilverchair
Content as Application: Integrating Medical Books into the Healthcare Workflow. Presented at TOC 2011 by Scott Grillo, Vice President and Group Publisher for McGraw-Hill Medical, a division of McGraw-Hill’s Higher Education, Professional
During a business trip in Baler, Aurora, we stayed at Costa Pacifica, which had great facilities and awesome amenities. See the photos and find out why staying here is worth your while.
Mobile & Desktop Cache 2.0: How To Create A Scriptable CacheBlaze Software Inc.
In this webinar, we’ll describe how you can build your own Scriptable Cache based on HTML5 localStorage, making Mobile cache work and giving Desktop Cache a boost. We’ll discuss the value of a Scriptable Cache, show the key elements you’ll need to create, and mention some of the pitfalls you need to beware of.
CEO of Evident.io, Tim Prendergast, sharing insights into the powerful combination of Cloud Security and DevOps practices at Velocity Conference 2015 in Santa Clara, CA USA. Learn how agility, security, and automation can be combined to perform continuous security assessments, real-time automated defensive measures, and other exciting security capabilities!
Our talk slides from UX Australia 2014.
In 2014, it’s no longer enough for our designed products and experiences to only be usable. Today’s modern audiences are exposed to an ever-increasing number of delightful, pleasurable and memorable experiences on a daily basis. Expectations are at an all-time high.
In a world where the perceived value associated with delightful experiences can help set a product apart from the competition, how can we as the designers of experiences stay on top of our game? What is this intangible ‘delight’ thing anyway, and is it even possible to create it?
We are killing serendipity (and I couldn't be happier) with mobile applications and I could not be happier.
Contact me on twitter: @schneidermike to discuss. Thanks
Ofer Ron, senior data scientist at LivePerson.
Recently, I've had the pleasure of presenting an introduction to Data Science and data driven products at DevconTLV
I focused this talk around the basic ideas of data science, not the technology used, since I thought that far too many times companies and developers rush to play around with "big data" related technologies, instead of figuring out what questions they want to answer, and whether these answers form a successful product.
Data Science Popup Austin: Conflict in Growing Data Science Organizations Domino Data Lab
Watch talk ➟ http://bit.ly/1NKPpQh
Eduardo Arino De La Rubia, VP of Product and Data Scientist in residence at Domino Data Lab talks about how to manage conflict in growing data science teams.
Ordinary people included anyone who is not a Geek like myself. This book is written for ordinary people. That includes manager, marketers, technical writers, couch potatoes and so on.
Data Science and Analytics for Ordinary People is a collection of blogs I have written on LinkedIn over the past year. As I continue to perform big data analytics, I continue to discover, not only my weaknesses in communicating the information, but new insights into using the information obtained from analytics and communicating it. These are the kinds of things I blog about and are contained herein.
What's the Value of Data Science for Organizations: Tips for Invincibility in...Ganes Kesari
This session was delivered as an Open Colloquium on Apr 30th 2020 for the Master in Information program students. It was organized by the Rutgers School of Communication & Information.
The session covers 3 themes:
- How do enterprises and not-for-profit organizations gain value from data science?
- What are the biggest challenges in data science that professionals are unaware of? How can students translate that into learnings, to make themselves indispensable in the industry
- What's the impact of COVID-19 and the recession on data science industry? How will the data jobs be impacted?
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
Using big data and implementing hadoop is a trend that people jump all to quickly to. Instead understanding the run time complexity of one's algorithms, reducing said complexity and managing the process from start to finish in a lean and agile way can yield massive cost savings - or save your organization.
TOC 2011: Content as Application, presented by Scott GrilloSilverchair
Content as Application: Integrating Medical Books into the Healthcare Workflow. Presented at TOC 2011 by Scott Grillo, Vice President and Group Publisher for McGraw-Hill Medical, a division of McGraw-Hill’s Higher Education, Professional
During a business trip in Baler, Aurora, we stayed at Costa Pacifica, which had great facilities and awesome amenities. See the photos and find out why staying here is worth your while.
Mobile & Desktop Cache 2.0: How To Create A Scriptable CacheBlaze Software Inc.
In this webinar, we’ll describe how you can build your own Scriptable Cache based on HTML5 localStorage, making Mobile cache work and giving Desktop Cache a boost. We’ll discuss the value of a Scriptable Cache, show the key elements you’ll need to create, and mention some of the pitfalls you need to beware of.
CEO of Evident.io, Tim Prendergast, sharing insights into the powerful combination of Cloud Security and DevOps practices at Velocity Conference 2015 in Santa Clara, CA USA. Learn how agility, security, and automation can be combined to perform continuous security assessments, real-time automated defensive measures, and other exciting security capabilities!
Our talk slides from UX Australia 2014.
In 2014, it’s no longer enough for our designed products and experiences to only be usable. Today’s modern audiences are exposed to an ever-increasing number of delightful, pleasurable and memorable experiences on a daily basis. Expectations are at an all-time high.
In a world where the perceived value associated with delightful experiences can help set a product apart from the competition, how can we as the designers of experiences stay on top of our game? What is this intangible ‘delight’ thing anyway, and is it even possible to create it?
We are killing serendipity (and I couldn't be happier) with mobile applications and I could not be happier.
Contact me on twitter: @schneidermike to discuss. Thanks
Locked Out in London (and tweeting about it)Sylvain Carle
Last year I talked about how people sucked at naming places.
This year I am going to talk about how stupid we can seem.
And the fact that we brag about it with geo metadata to boot.
All my examples are from Needium, our platfrom that matches needs expressed to a location and to businesses that can answer them.
TOC 2011: Content as Application, presented by Reid SherlineSilverchair
Content as Application: Integrating Medical Books into the Healthcare Workflow. Presented at TOC 2011 by Reid Sherline, Vice President of Publishing for Wolters Kluwer Health, Professional and Education
Case Studies: Harnessing Speed for Competitive AdvantageVMware Tanzu
In this session we will look at existing enterprises who are being successful in adopting modern approaches that give them the speed they need to compete more effectively.
Speaker: Faiz Parkar, Director EMEA GTM, Pivotal
AI in Business - Key drivers and future valueAPPANION
Artificial Intelligence is undoubtedly a hyped topic at the moment. But what is the reasoning for investors and digital platform players to bet very large amounts of money on this technology right now? To better understand the current market dynamics and to give an overview of renown predictions for the upcoming 2-3 years, we compiled a practical overview of this topic. This report covers the major driving forces of AI, assumptions for the future from the industry thought leaders as well as practical advice on how to start AI projects within your company.
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALMark Tabladillo
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.
Come diventare data scientist - Si ringrazie per le slide Paolo Pellegrini, Senior Consultant presso P4I (Partners4Innovation) e referente di tutte le progettualità relative alle tematiche Data Science e Big Data Analytics. Owner del primo gruppo in Italia dedicato dai Data Scientist.
Strata and Hadoop is where data science and new business fundamentals merge. And, in Strata and Hadoop World Conference, there were many famous personalities who have given their views on Hadoop and Big data. In this PPT, you will get to know about speakers who have spoken on this topic.
This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data.
This course is available on hadoop-skills.com for free!
This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:
• This course builds Understanding of Big Data problems with easy to understand examples and illustrations.
• History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop and was called Nutch
• What is Hadoop Magic which makes it so unique and powerful.
• Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
• And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.
This course is available for free on hadoop-skills.com
Scalable Predictive Analysis and The Trend with Big Data & AIJongwook Woo
The history and the latest trend of Big Data and Scalable Predictive Analysis for large scale data set using Distributed Machine Learning and Deep Learning with GPUs in Spark and Rapids; Invited talk at IS department of Yonsei University, Korea
The software development process is complete for computer project analysis, and it is important to the evaluation of the random project. These practice guidelines are for those who manage big-data and big-data analytics projects or are responsible for the use of data analytics solutions. They are also intended for business leaders and program leaders that are responsible for developing agency capability in the area of big data and big data analytics .
For those agencies currently not using big data or big data analytics, this document may assist strategic planners, business teams and data analysts to consider the value of big data to the current and future programs.
This document is also of relevance to those in industry, research and academia who can work as partners with government on big data analytics projects.
Technical APS personnel who manage big data and/or do big data analytics are invited to join the Data Analytics Centre of Excellence Community of Practice to share information of technical aspects of big data and big data analytics, including achieving best practice with modeling and related requirements. To join the community, send an email to the Data Analytics Centre of Excellence
BigData Meets the Federal Data Center - an overview of nosql solutions to data challenges (e.g. Hadoop, Hbase, Mongodb, cassandra, redis etc). Also includes a vignette on Google Prediction API.
From Lab to Factory: Creating value with dataPeadar Coyle
One of the biggest challenges in Data Science, is deploying Machine Learning models. There are cultural and technological challenges and I'll explain these and share some insights/ solutions.
Collaborative Data UX Design - Virtually and Phyically Datentreiber
Many data products fail, partly because users do not understand or accept the software. To avoid this, analytics solutions e.g. KPI dashboards should be designed together with the users and this is especially true for the user interface.
At the Data Brain Meetup Datentreiber Martin Szugat showed three wireframing tools to sketch UI designs collaboratively with the users:
1) the virtual collaboration tool Miro,
2) the PowerPoint add-on PowerMockup and
3) the physical Dashboard Wireframing Kit.
The objective of this module is to provide an overview of what the future impacts of big data are likely to be.
Upon completion of this module you will:
Gain valuable insight into the predictions for the future of Big Data
Be better placed to recognise some of the trends that are emerging
Acquire an overview of the possible opportunities your business can have with Big Data
Understand some of the start up challenges you might have with Big Data
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Taewook Eom
Data Programmer
Plaster(Planet Master)
of Big Data Infra
Pre-Assessor of Hiring Programmers
Mentor of 101 Startup Korea
Twitter: @taewooke
LinkedIn: http://kr.linkedin.com/in/taewookeom
http://www.flickr.com/photos/oreillyconf/10616622085/
3. Santa Clara
: Technical
New York
with Cloudera
: Financial, Business
Europe
: Privacy, Government
Boston
: Medical
http://strataconf.com/
by O’Reilly
Web 2.0
: Open, Sharing, Participation
Big Data
: Making Data Work
Change the World with Data.
4. Data
When hardware became commoditized,
software was valuable.
Now software being commoditized,
data is valuable.
– Tim O’Reilly, 2011
Data is like the blood of the enterprise.
– Amr Awadallah, CTO at Cloudera, 2013
5. What is Big Data?
All data that is not a fit for a traditional RDBMS,
whether used for OLTP or Analytics purposes
Big Data Architectural Patterns
http://strataconf.com/stratany2013/public/schedule/detail/30397
6. Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data
- Gartner, 2011
http://blog.vitria.com/Portals/47881/images/3values-resized-600.png
17. Big Data Space
No one tools is the right fit for all Big Data problem
Do not be afraid to recommend the right solution
for the problem over the popular solution
To do this, you must be aware of the entire ecosystem
Big Data Architectural Patterns
http://strataconf.com/stratany2013/public/schedule/detail/30397
18. Practical Performance Analysis and Tuning for Cloudera Impala
http://strataconf.com/stratany2013/public/schedule/detail/30551
19. Hadoop and the Relational Data Warehouse – When to Use Which?
http://strataconf.com/stratany2013/public/schedule/detail/30964
20. Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS
http://strataconf.com/stratany2013/public/schedule/detail/29968
21. Ignite
Signal Detection Theory: Man vs Machine
Co-Founder @VividCortex
Kyle Redinger
http://www.youtube.com/watch?v=Fg6mN-jevds
(5 minutes 6 seconds)
http://www.slideshare.net/realkyleredinger/man-vs-machine-signal-detection-theory-and-big-data
22. Signal Detection Theory: Man vs Machine
Remove the obvious and look at what is important
Remember: Less is more.
23. Keynote
Towards Strata 2014
Director of market research at O’Reilly Media
Roger Magoulas
http://www.youtube.com/watch?v=Ytd5VkEgQf8
(5 minutes 26 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31935
http://www.oreilly.com/data/free/files/stratasurvey.pdf
28. Science is fundamentally about data,
but data is not fundamentally about science
Beyond R and Ph.D.s: The Mythology of Data Science Debunked
Douglas Merrill (ZestFinance)
http://www.youtube.com/watch?v=J2sgObXbIWY (8 minutes 9 seconds)
29. People
A data scientist is a data analyst who lives in California.
– George Roumeliotis, (Intuit)
31. Data
Data
Data
Data
Businessperson: Business person, Leader, Entrepreneur
Creative: Artist, Jack-of-All-Trades, Hacker
Researcher: Scientist, Researcher, Statistician
Engineer: Engineer, Developer
http://datacommunitydc.org/blog/2012/08/data-scientists-survey-results-teaser/
http://cdn.oreillystatic.com/oreilly/radarreport/0636920029014/Analyzing_the_Analyzers.pdf
32. Scientists think they can code,
software engineers think they are scientists.
Team them up so they collaborate.
– Scott Sorenson (Ancestry.com)
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
33. How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce
http://strataconf.com/stratany2013/public/schedule/detail/30707
34. Data scientists spend their lives as data janitors
instead of leveraging their skills
– Wes McKinney (DataPad)
Building More Productive Data Science and Analytics Workflows
35. Keynote
Is Bigger Really Better?
Predictive Analytics
with Fine-grained Behavior Data
Professor at the NYU Stern School of Business
Foster Provost
http://www.youtube.com/watch?v=1jzMiAfLH2c
(10 minutes 16 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31685
36. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
37. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
38. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
Predictive does not mean actionable.
– Scott Sorenson (Ancestry.com)
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
39. More data gives you more precision, not more prediction.
Using multiple datasets to reduce errors when measuring values.
Is Bigger Really Better?
- Ravi Iyer (Ranker.com)
Predictive Analytics with Fine-grained Understand yourData Users, and Employees
Behavior Customers,
Using Graphs of Data to
40. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
41. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
42. Keynote
Big Impact from Big Data
Head of Analytics at Facebook
Ken Rudin
http://www.youtube.com/watch?v=RJFwsZwTBgg
(11 minutes 57 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31903
44. Hadoop is a hammer,
but you need other tools along with it.
Designing Your Data-Centric Organization
Josh Klahr (Pivotal)
http://www.youtube.com/watch?v=D86udfrVzrI (12 minutes)
45. Big Impact from Big Data
The way you organize information
depends on the question
you intend to ask of it.
- Richard Saul Wurman
Building a Data Platform
46. HaDump
: Loading data into Hadoop
for not reason.
Data Science Without a Scientist
http://strataconf.com/stratany2013/public/schedule/detail/31801
47. Big Impact from Big Data
Technical people still don't understand the business needs of business people!
Business people don't know what's a table.
- Anurag Tandon (MicroStrategy)
Inject Big Data into your Corporate DNA: Enable Every Employee to Make Data Driven Decisions
48. Ask the Right Questions
Organizations already have people who know their own data
better than mystical data scientists.
Learning Hadoop is easier than learning the company’s business.
- Gartner, 2012
Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS
http://strataconf.com/stratany2013/public/schedule/detail/29968
49. Non-linear Storytelling: Towards New Methods and Aesthetics for Data Narrative
http://strataconf.com/stratany2013/public/schedule/detail/30207
50. Every Soldier is a Sensor: Countering Corruption in Afghanistan
http://strataconf.com/stratany2013/public/schedule/detail/30828
54. Value of Data
Usable < Useful < Actionable
with Impact
If you can't answer for "so what?",
you only have facts, not insight
- Baron Schwartz (VividCortex Inc)
Making Big Data Small
Descriptive (Easy)
Predictive (Medium)
Prescriptive (Hard)
What happened?
What will happen?
What should we do about it?
Hadoop & Data Science for the Enterprise
55. The Future of Hadoop
: What Happened
& What's Possible?
Co-Founder of Hadoop
Doug Cutting
http://www.youtube.com/watch?v=_WwuZI6AhN8
(14 minutes 41 seconds)
http://strataconf.com/stratany2013/public/
schedule/detail/31591
Big Data is first industry that was created
by open source.
- Jack Norris (MapR Technologies)
Separating Hadoop Myths from Reality
Hadoop the kernel of the OS for data.
56. Hadoop's Impact on the Future of Data Management
Mike Olson (Cloudera)
http://www.youtube.com/watch?v=puHS2JNKgRM
http://strataconf.com/stratany2013/public/schedule/detail/31380
57. Single
:
:
:
:
:
:
S/W & H/W system
security model
management model
metadata model
audit model
resource
management model
Common
: storage & schema
http://www.slideshare.net/cloudera/enterprise-data-hub-the-next-big-thing-in-big-data
58. Last generation of data management is not sufficient
More copies, representations, transformations increase risk
Index once and reuse across workloads, lifecycle
NoSQL: indexing and updates for interactive apps
Hadoop: staging, persistence, and analytics
Data Governance for Regulated Industries Using Hadoop
http://strataconf.com/stratany2013/public/schedule/detail/30738
59. Data Intelligence
Rethink How You See Data
Sharmila Shahani-Mulligan (ClearStory Data)
http://www.youtube.com/watch?v=07hGulTOZGk (9 minutes 6 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31742
60. The Data Availability Problem
?
Access
Question
Sampling
Analysis & Disc
Modeling
overy
Loading
Insight
Data Prep – too slow!
Information Supply Chain
Introducing a New Way to Interact with Insight
http://strataconf.com/stratany2013/public/schedule/detail/31743
Presentation
61. Running Non-MapReduce Big Data applications on Apache Hadoop
http://strataconf.com/stratany2013/public/schedule/detail/30755
62. Apache HBase for Architects
http://strataconf.com/stratany2013/public/schedule/detail/30619
What’s Next for Apache HBase: Multi-tenancy, Predictability, and Extensions.
http://strataconf.com/stratany2013/public/schedule/detail/30857
63. Securing the Apache Hadoop Ecosystem
http://strataconf.com/stratany2013/public/schedule/detail/30302
64. An Introduction to the Berkeley Data Analytics Stack With Spark, Spark Streaming, Shark, Tachyon, and BlinkDB
http://strataconf.com/stratany2013/public/schedule/detail/30959
65. Schema
Information does not exist until a schema is defined
and data is stored in a relational database
- anonymous
Building a Data Platform
http://strataconf.com/stratany2013/public/schedule/detail/31400
66. Lessons Learned From A Decade’s Worth of Big Data At The U.S. National Security Agency (NSA)
http://strataconf.com/stratany2013/public/schedule/detail/30913
67. Managing a Rapidly Evolving Analytics Pipeline
http://strataconf.com/stratany2013/public/schedule/detail/30635
68. Stringer/Tez
Shark
SQL on/in Hadoop/Hbase Solutions
Perception is Key: Telescopes, Microscopes and Data
http://strataconf.com/strataeu2013/public/schedule/detail/32351
69. All SQL on Hadoop Solutions are
Missing the Point of Hadoop
Every Solution makes you define a schema
- SQL(Structured Query Language) is expressed over an assumed schema
Major reasons why Hadoop has taken of include:
- Ability to load data without defining a schema
- Process data using schema-on-read instead of first defining a schema
Hadoop contains a lot of:
- Raw, granular data sets with potentially inconsistent schemas
- Data sets in JSON, key-value, and other self-describing (non-relational) models
designed for schema-on-read processing
SQL on Hadoop solutions that make you first define a schema are missing
a major part of Hadoop’s usage patterns
Flexible Schema and the End of ETL
http://strataconf.com/stratany2013/public/schedule/detail/31868
71. Hadoop Adventures At Spotify
http://strataconf.com/stratany2013/public/schedule/detail/30570
72. Hadoop Adventures At Spotify
http://strataconf.com/stratany2013/public/schedule/detail/30570
73. Quick prototyping is the fastest way to internal advocacy. Ship It!
Cloud == Speed
We don’t always need a complicated solution. KISS
Play to your differentiating strengths. Experience >> Data
Bias towards impact.
It Takes a Village
EASE!! (Emulate, Analyze, Scale, Evaluate)
How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce
http://strataconf.com/stratany2013/public/schedule/detail/30707
Prototyping is key to overcoming resistance to change
Technical architecture is heavily influenced by people organization
Developing a team of experienced Hadoop users can often be done
using internal employees
A culture of experimentation and innovation yields the best result
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
http://strataconf.com/stratany2013/public/schedule/detail/30499