Big Data yani büyük veri nedir diyorsanız ve büyük veri analizinin ne gibi yararlar sağlayacağını merak ediyorsanız sizin için Renerald olarak bu sunumu hazırladık. Büyük veri analizleri sayesinde, stratejilerinizi bilimsel veriler ışığında geliştirip şirketinize inanılmaz artı değerler kazandırabileceksiniz.
Big data kavramı hakkında en temel bilgiler ve örnek big data senaryolarının yer aldığı bir sunumdur. Büyük verinin hangi sektörlerde ve nasıl kullanılabileceğine dair ipuçları da yer alan sunumda 4 adet dikkat çekici video da yer almaktadır.
Yazılarıma göz atmak için: velibahceci.com 'u ziyaret edebilirsiniz.
Metadata is hotter than ever, according to a number of recent DATAVERSITY surveys. More and more organizations are realizing that in order to drive business value from data, robust metadata is needed to gain the necessary context and lineage around key data assets. At the same time, industry regulations are driving the need for better transparency and understanding of information.
While metadata has been managed for decades, new strategies & approaches have been developed to support the ever-evolving data landscape, and provide more innovative ways to drive business value from metadata. This webinar will provide an overview of metadata strategies & technologies available to today’s organization, and provide insights into building successful business strategies for metadata adoption & use.
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022GoDataDriven
Deploy your own modern data stack using open source components usingTerraform cloud-agnostic tooling. By leveraging open-source components you can deploy a state-of-the-art modern data platform in a day. What are the pro's and con's of “build-it-yourself" in the data+analytics space?
Digital Enterprise Festival Birmingham 13/04/17 - Ian West Cognizant VP Data ...CIO Edge
Learn what the EU Global Data Protection Regulation means for your business – Carrot or Stick its your choice but with fines of €20m or up to 4% of Global Revenue (whichever is the larger) being applied for every data breach and every data mis-use after May 2018 the carrot is the better option.
Are you aware? Are you prepared? Do you comply?
To book a free non sales consultation about GDPR with Ian West contact us enquiry@digitalenterprisefest.com
The explosive growth of data and the value it creates calls on data professionals to level up their programs to build, demonstrate, and maintain trust. The days of fine print, pre-ticked boxes, and data hoarding are gone and strong collaboration from data, privacy, marketing and ethics teams is necessary to design trustworthy data-driven practices.
Join for a discussion on the latest trends in trusted data and how you can take critical steps to build trust in data practices by:
- Embedding privacy by design into data operations
- Respecting individual choice and optimizing the ongoing relationship with consumers
- Preparing for future data challenges including responsible AI and sustainability
New Analytic Uses of Master Data Management in the EnterpriseDATAVERSITY
Companies all over the world are going through digital transformation now, which in many cases is all about maturing the data environment and the use of data. Master data is key to this effort. All transformative projects require master data and usually many subject areas.
What could you accomplish if cultivating master data didn’t have to be part of every project and could be accessed as a service?
We’ll look at creative enterprise use cases of Master Data Management in the enterprise. We’ll see what some MDM vendors are doing with AI and how the future of MDM will be shaped by looking at some specific MDM actions influenced by AI.
Big Data yani büyük veri nedir diyorsanız ve büyük veri analizinin ne gibi yararlar sağlayacağını merak ediyorsanız sizin için Renerald olarak bu sunumu hazırladık. Büyük veri analizleri sayesinde, stratejilerinizi bilimsel veriler ışığında geliştirip şirketinize inanılmaz artı değerler kazandırabileceksiniz.
Big data kavramı hakkında en temel bilgiler ve örnek big data senaryolarının yer aldığı bir sunumdur. Büyük verinin hangi sektörlerde ve nasıl kullanılabileceğine dair ipuçları da yer alan sunumda 4 adet dikkat çekici video da yer almaktadır.
Yazılarıma göz atmak için: velibahceci.com 'u ziyaret edebilirsiniz.
Metadata is hotter than ever, according to a number of recent DATAVERSITY surveys. More and more organizations are realizing that in order to drive business value from data, robust metadata is needed to gain the necessary context and lineage around key data assets. At the same time, industry regulations are driving the need for better transparency and understanding of information.
While metadata has been managed for decades, new strategies & approaches have been developed to support the ever-evolving data landscape, and provide more innovative ways to drive business value from metadata. This webinar will provide an overview of metadata strategies & technologies available to today’s organization, and provide insights into building successful business strategies for metadata adoption & use.
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022GoDataDriven
Deploy your own modern data stack using open source components usingTerraform cloud-agnostic tooling. By leveraging open-source components you can deploy a state-of-the-art modern data platform in a day. What are the pro's and con's of “build-it-yourself" in the data+analytics space?
Digital Enterprise Festival Birmingham 13/04/17 - Ian West Cognizant VP Data ...CIO Edge
Learn what the EU Global Data Protection Regulation means for your business – Carrot or Stick its your choice but with fines of €20m or up to 4% of Global Revenue (whichever is the larger) being applied for every data breach and every data mis-use after May 2018 the carrot is the better option.
Are you aware? Are you prepared? Do you comply?
To book a free non sales consultation about GDPR with Ian West contact us enquiry@digitalenterprisefest.com
The explosive growth of data and the value it creates calls on data professionals to level up their programs to build, demonstrate, and maintain trust. The days of fine print, pre-ticked boxes, and data hoarding are gone and strong collaboration from data, privacy, marketing and ethics teams is necessary to design trustworthy data-driven practices.
Join for a discussion on the latest trends in trusted data and how you can take critical steps to build trust in data practices by:
- Embedding privacy by design into data operations
- Respecting individual choice and optimizing the ongoing relationship with consumers
- Preparing for future data challenges including responsible AI and sustainability
New Analytic Uses of Master Data Management in the EnterpriseDATAVERSITY
Companies all over the world are going through digital transformation now, which in many cases is all about maturing the data environment and the use of data. Master data is key to this effort. All transformative projects require master data and usually many subject areas.
What could you accomplish if cultivating master data didn’t have to be part of every project and could be accessed as a service?
We’ll look at creative enterprise use cases of Master Data Management in the enterprise. We’ll see what some MDM vendors are doing with AI and how the future of MDM will be shaped by looking at some specific MDM actions influenced by AI.
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
The data disciplines listed in the title must work together. The key to success requires understanding the boundaries and overlaps between the disciplines. Wouldn’t it be great to be able to present the relationships between the disciplines in a simple all-in diagram? At the end of this webinar, you will be able to do just that.
This new RWDG webinar with Bob Seiner will outline how Data Management, Metadata Management, and Data Governance can be optimized to work together. Bob will share a diagram that has successfully communicated the relationship between these disciplines to leadership resulting in the disciplines working in harmony and delivering success.
Bob will share the following in this webinar:
- Categories of disciplines focused on managing data as an asset
- A definition of Data Management that embraces numerous data disciplines
- The importance of Metadata -Management to all data disciplines
- Why data and metadata require formal governance
- A graphic that effectively exhibits the relationship between the disciplines
Data Governance and Metadata ManagementDATAVERSITY
Metadata is a tool that improves data understanding, builds end-user confidence, and improves the return on investment in every asset associated with becoming a data-centric organization. Metadata’s use has expanded beyond “data about data” to cover every phase of data analytics, protection, and quality improvement. Data Governance and metadata are connected at the hip in every way possible. As the song goes, “You can’t have one without the other.”
In this RWDG webinar, Bob Seiner will provide a way to renew your energy by focusing on the valuable asset that can make or break your Data Governance program’s success. The truth is metadata is already inherent in your data environment, and it can be leveraged by making it available to all levels of the organization. At issue is finding the most appropriate ways to leverage and share metadata to improve data value and protection.
Throughout this webinar, Bob will share information about:
- Delivering an improved definition of metadata
- Communicating the relationship between successful governance and metadata
- Getting your business community to embrace the need for metadata
- Determining the metadata that will provide the most bang for your bucks
- The importance of Metadata Management to becoming data-centric
This presentation was made on June 18, 2020.
Video recording of the session can be viewed here: https://youtu.be/YEtDwYSXXJo
For many companies, model documentation is a requirement for any model to be used in the business. For other companies, model documentation is part of a data science team’s best practices. Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, and information on model performance.
Collecting and documenting this information can take a data scientist days to complete for each model. The model document needs to be comprehensive and consistent across various projects. The process of creating this documentation is tedious for the data scientist and wasteful for the business because the data scientist could be using that time to build additional models and create more value. Inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.
In this virtual meetup, we will learn how to create comprehensive, high-quality model documentation in minutes that saves time, increases productivity, and improves model governance.
Speaker's Bio:
Nikhil Shekhar: Nikhil is a Machine Learning Engineer at H2O.ai. He is currently working on our automatic machine learning platform, Driverless AI. He graduated from the University of Buffalo majoring in Artificial Intelligence and is interested in developing scalable machine learning algorithms.
Thoughts on how to use design and data when developing customer experiences.
Slides from a 30-minute keynote given in Helsinki in March, 2016.
For more information, please read my blog post on how to use customer data to provide personalized omnichannel experiences: http://affecto.com/insights/blog/every-customer-deserves-vip-treatment/
International Data Spaces: Data Sovereignty for Business Model InnovationBoris Otto
This presentation given at the European Big Data Value Forum on November 13, 2018, in Vienna introduces International Data Spaces (IDS) as a reference architecture and implementation for data sovereignty. The IDS archiecture rests on usage control technologies and trusted computing environments and, thus, forms a strategic enabler for a fair data economy which respects the interests of the data owners.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
If your organization is in a highly-regulated industry – or relies on data for competitive advantage – data governance is undoubtedly a top priority. Whether you’re focused on “defensive” data governance (supporting regulatory compliance and risk management) or “offensive” data governance (extracting the maximum value from your data assets, and minimizing the cost of bad data), data quality plays a critical role in ensuring success.
Join our webinar to learn how enterprise data quality drives stronger data governance, including:
The overlaps between data governance and data quality
The “data” dependencies of data governance – and how data quality addresses them
Key considerations for deploying data quality for data governance
Data integration is intrinsic to how modern research is undertaken in areas such as genomics, drug development and personalised medicine. To better enable this integration a large number of biomedical ontologies have been developed to provide standard semantics for describing metadata. There are now several hundred biomedical ontologies in widespread use that describe concepts such as genes, molecules, drugs and diseases. This amounts to millions of terms that are interconnected via relationships that naturally form a graph of biomedical terminology.
The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) integrates over 160 ontologies and provide a central point for the biomedical community to query and visualise ontologies. OLS also provide a RESTful API over the ontologies that is used in high-throughput data annotation pipelines. OLS is built on top of a Neo4j database that provides efficient indexes for extracting ontological relationships. We have developed generic tools for loading RDF/OWL ontologies into Neo4j where the indexes are optimised for serving common ontology queries. We are now moving to adopt graph database more widely in applications relating to ontology mapping prediction and recommendation systems for data annotation.
DAS Slides: Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key inter-relationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall enterprise architecture for enhanced business value and success.
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo
Sylvain Dutilh, INFORMATION INTELLIGENCE SPECIALIST, Landsbankinn
Traditional data processing leaves large pools of replicated and unsynchronized data sets behind. In an era when data grows exponentially and is disconnected and spread across silos, it has never been more unnecessary to replicate data. In this session, Sylvain from Landsbankinn will walk us through his organization's journey of implementing a Logical Data Warehouse and a data-sharing program by leveraging Data Virtualization capability that allowed it to build a central, secure business rules repository and an agile, modern data mesh architecture.
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
The data disciplines listed in the title must work together. The key to success requires understanding the boundaries and overlaps between the disciplines. Wouldn’t it be great to be able to present the relationships between the disciplines in a simple all-in diagram? At the end of this webinar, you will be able to do just that.
This new RWDG webinar with Bob Seiner will outline how Data Management, Metadata Management, and Data Governance can be optimized to work together. Bob will share a diagram that has successfully communicated the relationship between these disciplines to leadership resulting in the disciplines working in harmony and delivering success.
Bob will share the following in this webinar:
- Categories of disciplines focused on managing data as an asset
- A definition of Data Management that embraces numerous data disciplines
- The importance of Metadata -Management to all data disciplines
- Why data and metadata require formal governance
- A graphic that effectively exhibits the relationship between the disciplines
Data Governance and Metadata ManagementDATAVERSITY
Metadata is a tool that improves data understanding, builds end-user confidence, and improves the return on investment in every asset associated with becoming a data-centric organization. Metadata’s use has expanded beyond “data about data” to cover every phase of data analytics, protection, and quality improvement. Data Governance and metadata are connected at the hip in every way possible. As the song goes, “You can’t have one without the other.”
In this RWDG webinar, Bob Seiner will provide a way to renew your energy by focusing on the valuable asset that can make or break your Data Governance program’s success. The truth is metadata is already inherent in your data environment, and it can be leveraged by making it available to all levels of the organization. At issue is finding the most appropriate ways to leverage and share metadata to improve data value and protection.
Throughout this webinar, Bob will share information about:
- Delivering an improved definition of metadata
- Communicating the relationship between successful governance and metadata
- Getting your business community to embrace the need for metadata
- Determining the metadata that will provide the most bang for your bucks
- The importance of Metadata Management to becoming data-centric
This presentation was made on June 18, 2020.
Video recording of the session can be viewed here: https://youtu.be/YEtDwYSXXJo
For many companies, model documentation is a requirement for any model to be used in the business. For other companies, model documentation is part of a data science team’s best practices. Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, and information on model performance.
Collecting and documenting this information can take a data scientist days to complete for each model. The model document needs to be comprehensive and consistent across various projects. The process of creating this documentation is tedious for the data scientist and wasteful for the business because the data scientist could be using that time to build additional models and create more value. Inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.
In this virtual meetup, we will learn how to create comprehensive, high-quality model documentation in minutes that saves time, increases productivity, and improves model governance.
Speaker's Bio:
Nikhil Shekhar: Nikhil is a Machine Learning Engineer at H2O.ai. He is currently working on our automatic machine learning platform, Driverless AI. He graduated from the University of Buffalo majoring in Artificial Intelligence and is interested in developing scalable machine learning algorithms.
Thoughts on how to use design and data when developing customer experiences.
Slides from a 30-minute keynote given in Helsinki in March, 2016.
For more information, please read my blog post on how to use customer data to provide personalized omnichannel experiences: http://affecto.com/insights/blog/every-customer-deserves-vip-treatment/
International Data Spaces: Data Sovereignty for Business Model InnovationBoris Otto
This presentation given at the European Big Data Value Forum on November 13, 2018, in Vienna introduces International Data Spaces (IDS) as a reference architecture and implementation for data sovereignty. The IDS archiecture rests on usage control technologies and trusted computing environments and, thus, forms a strategic enabler for a fair data economy which respects the interests of the data owners.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
If your organization is in a highly-regulated industry – or relies on data for competitive advantage – data governance is undoubtedly a top priority. Whether you’re focused on “defensive” data governance (supporting regulatory compliance and risk management) or “offensive” data governance (extracting the maximum value from your data assets, and minimizing the cost of bad data), data quality plays a critical role in ensuring success.
Join our webinar to learn how enterprise data quality drives stronger data governance, including:
The overlaps between data governance and data quality
The “data” dependencies of data governance – and how data quality addresses them
Key considerations for deploying data quality for data governance
Data integration is intrinsic to how modern research is undertaken in areas such as genomics, drug development and personalised medicine. To better enable this integration a large number of biomedical ontologies have been developed to provide standard semantics for describing metadata. There are now several hundred biomedical ontologies in widespread use that describe concepts such as genes, molecules, drugs and diseases. This amounts to millions of terms that are interconnected via relationships that naturally form a graph of biomedical terminology.
The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) integrates over 160 ontologies and provide a central point for the biomedical community to query and visualise ontologies. OLS also provide a RESTful API over the ontologies that is used in high-throughput data annotation pipelines. OLS is built on top of a Neo4j database that provides efficient indexes for extracting ontological relationships. We have developed generic tools for loading RDF/OWL ontologies into Neo4j where the indexes are optimised for serving common ontology queries. We are now moving to adopt graph database more widely in applications relating to ontology mapping prediction and recommendation systems for data annotation.
DAS Slides: Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key inter-relationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall enterprise architecture for enhanced business value and success.
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo
Sylvain Dutilh, INFORMATION INTELLIGENCE SPECIALIST, Landsbankinn
Traditional data processing leaves large pools of replicated and unsynchronized data sets behind. In an era when data grows exponentially and is disconnected and spread across silos, it has never been more unnecessary to replicate data. In this session, Sylvain from Landsbankinn will walk us through his organization's journey of implementing a Logical Data Warehouse and a data-sharing program by leveraging Data Virtualization capability that allowed it to build a central, secure business rules repository and an agile, modern data mesh architecture.
http://www.saturd.ru/ - заказать увеличение продаж можно здесь.
Презентация на тему BIG DATA в интернет продажах. Биг дата - это обработка и использование собранных данных о человеке для того чтобы дать ему релевантное предложение.
Темы: Продвижение в интернете, SaturD, Продвижение медицинских услуг
Big Data for Customer centric organisation - CleverDATA for Oracle CIO Club M...CleverDATA
- how to know your customer for not to loose him
- how to use customer centric approach
- how to get 3D customer view
- data sources review
- customer profile on every stage of customer life cycle
- use cases
- how to build a solution architecture to use all your data
- 1DMP.RU for Enterprise components to work with big data
- Oracle Big Data appliance to deploy a solution
- 1DMP.RU solution's benefits
Решение технологического кейса для компании "Аэрофлот"Mikhail Alekseev
Решение технологического кейса в рамках чемпионата Microsoft Challenge Cup "Внедрение технологий Microsoft в инфраструктуру компании Аэрофлот", команда ImprovY
A graph is a data structure composed of vertices/dots and edges/lines. A graph database is a software system used to persist and process graphs. The common conception in today's database community is that there is a tradeoff between the scale of data and the complexity/interlinking of data. To challenge this understanding, Aurelius has developed Titan under the liberal Apache 2 license. Titan supports both the size of modern data and the modeling power of graphs to usher in the era of Big Graph Data. Novel techniques in edge compression, data layout, and vertex-centric indices that exploit significant orders are used to facilitate the representation and processing of a single atomic graph structure across a multi-machine cluster. To ensure ease of adoption by the graph community, Titan natively implements the TinkerPop 2 Blueprints API. This presentation will review the graph landscape, Titan's techniques for scale by distribution, and a collection of satellite graph technologies to be released by Aurelius in the coming summer months of 2012.
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
Digital Branding Summit 15-16 october 2014. Александр Филатов (Розничная сеть...World Brand Academy
Digital Branding Summit 15-16 october 2014.
Александр Филатов (Розничная сеть Улыбка Радуги) & Екатерина Савченко (Synqera)
Personalization, multi channeling & big data analysis: how buzz words turn into real business?
The Danger of Big Data by Kerry Bodine - Forrester research
Service design teams can glean big data insights from social media, financial systems, emails, surveys, call centers, and digital and analog sensors. But companies that fixate on amassing new data sources put themselves at risk of neglecting small data insights gathered through qualitative research methods. How can firms achieve balance?
Big Data - что это и с чем его "едят") Откуда взялся термин Big Data, какое содержание он в себе несет, и, есть ли будущее у тренда Big Data. Изучаем...
Аналитический обзор рынка Больших Данных от IPOboardIpo Board
Данный аналитический обзор посвящен рынку Больших Данных.
В обзоре освящена текущая ситуация на международном и российском рынках.
Также описаны тенденции рынка и его прогноз.
Анджей Аршавский, Директор ЦК, ЦК по супермассивам данных, Сбербанк-Технологии. "Типы данных и корпоративная платформа для полного цикла работы с данными"
•19:20-19:40 Максим Еременко, Управляющий директор-начальник управления, Управление инструментов и моделей, Сбербанк. "Как модели могут сохранять или зарабатывать деньги?"
•19:40-20:00 Тихонов Роман, Управляющий директор — директор управления, Управление валидации, Сбербанк. "Кейсы Сбербанка: от предсказания дефолта в реальном времени до глубинного обучения на данных естественного языка".
Если вы хотите получить доступ к видео выступления, напишите нам на datascienceweek2016@gmail.com.
Логическая витрина для доступа к большим даннымSergey Gorshkov
Как компании получить максимальную выгоду от накопленной информации? Как интегрировать данные из хранилищ Big Data с традиционной аналитической информацией?
Решения HPE Software для Больших данныхYuri Yashkin
Аналитика Больших данных позволяет улучшить бизнеспроцессы и операционную деятельность, повысить эффективность управления рисками и добиться дополнительной экономии средств. В документе описаны восемь подводных камней на пути к внедрению аналитики Больших данных
Большие данные и бизнес-аналитика: как найти пользу?Marina Payvina
Как извлечь пользу из больших данных.
Инструменты бизнес-аналитики для анализа и исследования больших данных
Мероприятие:
День Науки НИУ ВШЭ 2015
Фото: http://vk.com/album-66011151_214023156
Восемь подводных камней на пути к внедрению аналитики Больших данныхElizaveta Alekseeva
Приручить Большие данные, аналитику и искусственный интеллект и добиться от них пользы для бизнеса не так-то просто. Узнайте, какие «подводные камни» ожидают тех, кто решил внедрять аналитику Больших данных, и – главное – как их преодолеть.
2. 2
История появления термина BIG DATA
Клиффорд Линч, редактор журнала Nature, в 2008 году впервые упомянул о термине BIG DATA в специальном
номере журнала с темой «Как могут повлиять на будущее науки технологии, открывающие возможности
работы с большими объёмами данных?», в котором были собраны материалы о феномене взрывного роста
объёмов и многообразия обрабатываемых данных и технологических перспективах в парадигме вероятного скачка
«от количества к качеству»
В 2009 году термин широко распространился в деловой прессе, а к 2010 году относят появление первых продуктов и
решений, относящихся исключительно и непосредственно к проблеме обработки больших данных. К 2011 году
большинство крупнейших поставщиков информационных технологий для организаций в своих деловых стратегиях
используют понятие о больших данных, в том числе IBM, Oracle, Microsoft, Hewlett-Packard, EMC, а основные
аналитики рынка информационных технологий посвящают концепции выделенные
В 2011 году Gartner отмечает большие данные как тренд номер два в информационно-технологической
инфраструктуре (после виртуализации и как более существенный, чем энергосбережение и мониторинг).
Прогнозируется, что внедрение технологий больших данных наибольшее влияние окажет на информационные
технологии в производстве ,здравоохранении, торговле, государственном управлении, а также в сферах и отраслях,
где регистрируются индивидуальные перемещения ресурсов
3. 3
Что же такое BIG DATA?
Группа технологий и методов производительной обработки динамически растущих объемов
данных( структурированных и неструктурированных) в распределенных информационных
системах, обеспечивающих организацию качественно новой полезной информацией
Big Data — это наборы данных такого объема, что традиционные инструменты не
способны осуществлять их захват, управление и обработку за приемлемое для практики
время.
Технология Big Data предоставляет услуги, помогающие раскрыть коммерческий потенциал
мегамассивов данных за счет поиска ценных закономерностей и фактов путем объединения
и анализа больших объемов данных.
4. 4
Volume Variety Velocity
Volume Variety Velocity
Реально большие
объемы данных в
физическом смысле
Слабо
структурированные
и разнородные
данные
Необходимость
высокой скорости
обработки данных
1Gb, 1Tb, 1Pb, 1EXb, 1Zb DB, XML, Logs, Texts,
Video, Audio
5. 5
Объем данных корпораций по отраслям в 2012г
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Ценные бумаги
Банкинг
Медицина
Энергетика
Правительство
Страхование
Коммуникации и медиа
Энергетика
Объем данных в Тб
Источник данных: McKinsey
6. 6
Интернет и мобильные технологии
Twitter 175 млн твит сообщений в день
Facebook 300 млн фото загружаемых ежедневно
Google 24PB ежедневно
AT&T передает 30Pb в день
Walmart более 1 млн продаж в час
Объем данных, переданных/полученных на мобильные
устройства, — 1,3 эксабайт
7. 7
Основные технологии анализа в BigData
MapReduce - это фреймворк для вычисления некоторых наборов распределенных задач с
использованием большого количества компьютеров (называемых «нодами»),
образующих кластер, разработанный компанией Google.
Hadoop - набор утилит, библиотек и программный каркас для разработки и выполнения
распределённых программ, работающих на кластерах из сотен и тысяч узлов.
NoSql - ряд подходов, направленных на реализацию хранилищ баз данных, имеющих
существенные отличия от моделей, используемых в традиционных реляционных СУБД с
доступом к данным средствами языка SQL. Применяется к базам данных, в которых
делается попытка решить проблемы масштабируемости и доступности за
счёт атомарности и согласованности данных
9. 9
Методы анализа используемые в BigData
Уникальность подхода больших данных заключается в агрегировании
огромного объема неструктурированной информации из разных
источников в одном месте.
Классификация (методы категоризации новых данных на основе принципов,
ранее применённых к уже наличествующим данным)
Кластерный анализ
Регрессионный анализ
Рекомендательные системы
Искусственные нейронные сети, в том числе генетические алгоритмы;
10. 10
Самые продвинутые отрасли BigData
01 Маркетинг 03
Сегментация рынка
Моделирование
приобретения и оттока
клиентов
Рекомендательные
системы
Анализ соц.медиа
02 Финансы Медицина
Детектирование
аномального поведения
Анализ кредитных рисков
Страховое моделирование
Генетический анализ
Анализ клинических
испытаний
Экспертные системы
11. 11
Value для бизнеса
Value
Учитывая масштабность, перед бизнесом встала задача не только выбора
адекватного инструментария по анализу информации, но и построения
оптимальной вычислительной инфраструктуры, которая была бы
эффективной и не очень дорогой.
.Действительно, большие хранилища данных в сфере финансовых услуг, телекоммуникаций, розничной торговли и
государственных организаций существовали на протяжении многих лет. Применялись решения по обработке данных в
реальном времени для управления бизнес-процессами, например в торговле, а также высокопроизводительные
вычислительные системы для научных исследований. Различие их состоит в том, что те системы, которые раньше решали
отдельные проблемы бизнеса на больших предприятиях, сегодня становятся основой осуществления их бизнес-стратегии.
Технология Big Data позволяет уменьшить расходы на ИТ-инфраструктуру и ПО, сократить затраты на рабочую силу за счет
более эффективных методов интеграции данных, управления, анализа и выработки решения; обеспечить увеличение
дохода и прибыли путем новых или более эффективных способов ведения бизнеса. То есть на современном этапе те же
самые технологии представляют качественно новую ценность для предприятия
12. 12
Кейс «Как компания может узнать о ваших секретах?»
Магазин Target и
беременная девочка,
США 2012г
13. 13
Спасибо за
внимание!
Алексеев Михаил
alekseev.miha@gmail.com
Linkedin
Facebook
Vk
Editor's Notes
В мире больших данных мы можем проанализировать огромное количество данных, а в некоторых случаях – обработать ВСЕ данные, касающиеся того или иного явления, а не полагаться на случайные выборки.
В мире больших данных мы можем проанализировать огромное количество данных, а в некоторых случаях – обработать ВСЕ данные, касающиеся того или иного явления, а не полагаться на случайные выборки.
Два года назад огромная сеть магазинов Target стала использовать машинное обучение при взаимодействии с покупателями. В качестве обучающей выборки использовались данные, накопленные компанией за несколько лет. В качестве маркеров конкретных покупателей использовались банковские и именные скидочные карты. Алгоритмы проанализировали, как и в каких условиях менялись предпочтения покупателей и делали прогнозы. А на основе этих прогнозов покупателям делались всевозможные специальные предложения. Весной 2012 года разразился скандал, когда отец двенадцатилетней школьницы пожаловался, что его дочери присылают буклеты с предложениями для беременных. Когда сеть Target уже приготовилась признавать ошибку и извиняться перед обиженными покупателями, выяснилось, что девочка действительно была беременна, хотя ни она, ни ее отец на момент жалобы не знали об этом. Алгоритм отловил изменения в поведении покупательницы, характерные для беременных женщин.
Два года назад огромная сеть магазинов Target стала использовать машинное обучение при взаимодействии с покупателями. В качестве обучающей выборки использовались данные, накопленные компанией за несколько лет. В качестве маркеров конкретных покупателей использовались банковские и именные скидочные карты. Алгоритмы проанализировали, как и в каких условиях менялись предпочтения покупателей и делали прогнозы. А на основе этих прогнозов покупателям делались всевозможные специальные предложения. Весной 2012 года разразился скандал, когда отец двенадцатилетней школьницы пожаловался, что его дочери присылают буклеты с предложениями для беременных. Когда сеть Target уже приготовилась признавать ошибку и извиняться перед обиженными покупателями, выяснилось, что девочка действительно была беременна, хотя ни она, ни ее отец на момент жалобы не знали об этом. Алгоритм отловил изменения в поведении покупательницы, характерные для беременных женщин.