ABOUT THE AUTHORS
Jeff Fried, CT CTO O, BA Insight Jeff is a long-standing search nerd. He was the VP of Products for semantic search company LingoMotors, LingoMotor s, VP of Advanced Solutions for FAST Search, and technical product manager for all Microsoft enterprise enter prise search products. He is also a frequent writer, writer, who has authored 50 technical papers and co-authored two new books on SharePoint and search. He holds over 15 patents, and routinely speaks at industry industr y events.
Agnes Molnar, MVP Agnes is a Microsoft SharePoint MVP and a Senior Solutions Consultant for BA Insight. She has also co-authored and contributed to several SharePoint books. She is a regular speaker at technic technical al confer conferenc ences es and sympos symposium iumss arou around nd the world. world.
Michael Himelstein, vTSP Michael has more than 20 years of practical pr actical experience developing, deploying, deploying, and architecting search-based applications. In this role he has advised hundreds of the largest companies around the world wor ld around unied information access. He was previously a Technology Solutions Manager in the Enterprise Search Group at Microsoft.
Tony Malandain Tony Malandain is a co-founder of BA-Insight. Tony architected and built the rst version of the product which gained signicant momentum on the Microsoft Ofce SharePoint Server (MOSS) and positioned BA Insight as the leading Enhanced Search vendor for SharePoint. Tony was awarded a patent for the core AptivRank technology, which monitors monitor s usage behavior of search users to inuence relevancy automatically. automatically.
Eric Moore Eric Moore is the lead for BA Insight’s Search Interactions and Content Enrichment teams. He is accustomed to living at the leading edge edge of search, and has deep experience with multimedia search, XML search, and content enrichment. enr ichment. Prior to BA Insight, Eric worked wor ked for ve years at FAST and on the Microsoft Search Platform team. Eric has developed state of the art ar t Products, algorithms, and platforms for specialized information worker s.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
2
ABOUT THE AUTHORS
Jeff Fried, CT CTO O, BA Insight Jeff is a long-standing search nerd. He was the VP of Products for semantic search company LingoMotors, LingoMotor s, VP of Advanced Solutions for FAST Search, and technical product manager for all Microsoft enterprise enter prise search products. He is also a frequent writer, writer, who has authored 50 technical papers and co-authored two new books on SharePoint and search. He holds over 15 patents, and routinely speaks at industry industr y events.
Agnes Molnar, MVP Agnes is a Microsoft SharePoint MVP and a Senior Solutions Consultant for BA Insight. She has also co-authored and contributed to several SharePoint books. She is a regular speaker at technic technical al confer conferenc ences es and sympos symposium iumss arou around nd the world. world.
Michael Himelstein, vTSP Michael has more than 20 years of practical pr actical experience developing, deploying, deploying, and architecting search-based applications. In this role he has advised hundreds of the largest companies around the world wor ld around unied information access. He was previously a Technology Solutions Manager in the Enterprise Search Group at Microsoft.
Tony Malandain Tony Malandain is a co-founder of BA-Insight. Tony architected and built the rst version of the product which gained signicant momentum on the Microsoft Ofce SharePoint Server (MOSS) and positioned BA Insight as the leading Enhanced Search vendor for SharePoint. Tony was awarded a patent for the core AptivRank technology, which monitors monitor s usage behavior of search users to inuence relevancy automatically. automatically.
Eric Moore Eric Moore is the lead for BA Insight’s Search Interactions and Content Enrichment teams. He is accustomed to living at the leading edge edge of search, and has deep experience with multimedia search, XML search, and content enrichment. enr ichment. Prior to BA Insight, Eric worked wor ked for ve years at FAST and on the Microsoft Search Platform team. Eric has developed state of the art ar t Products, algorithms, and platforms for specialized information worker s.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
2
WHAT’S WHA T’S IN T HIS E-BOOK? INTRODUCTION
There’s a lot to say about SharePoint 2013, and about search in SharePoint 2013. This e-book is focused only on search, and is meant to give you you a working understanding of the new features so that you you can get oriented with them and think about how you will deploy and use them. It does not tr y to cover everything, nor is it meant to be a hands-on guide. In this book we will be covering ve key areas as they relate to search. These key areas are color coded, and represented by the blocks below. Each section contains shor t chapters that can be read independently or continuously. continuously. The goal is to enable readers reader s to focus on the information infor mation they need to learn about at the moment.
User Experience
Working with Queries & Results
Working with Content
Architecture, Deployment & Operations
Applications & Development
Not every ever y area of search has changed in SharePoint 2013, and those that are a re currently familiar with search won’t be lost at sea. For example, the deployment model, services architecture, and crawling and connector subsystems are pretty much the same as with SharePoint 2010. End users will see a dramatically different search UI, but they will be able to use it with no training (it’s quite intuitive). intuitive). If you have have built up up a competency in search, search, you’ll be able to take it further fur ther in many ways — which we highlight highlight throughout this e-book.
Deeper Dives: Technet — What’s new in SharePoint 2013 search Blog article from Microsoft Search Group TechNet landing landing page refreshed weekly with articles on SharePoint 2013 Highlights of Search in SharePoint 2013
THEESSENTIAL ESSENTIAL GUIDE GUIDETTO OENTERPRISE ENTERPRISE SEARCH SEARCH SHAREPOINT 2013 THE
3
WHAT’S WHA T’S IN T HIS E-BOOK?
Highlig Hig hlights hts and Key Takeake-Aways Aways User Experience WHAT’S WHAT’S NEW?
The face of search is totally revamped — not just in keeping with the new SharePoint UX overall, but but with deep renements, better display for results using Result Blocks, a hover panel with previews, and more.
BENEFITS
The search experience is easy, easy, clean, and fast.
Working with Queries & Results WHAT’S WHAT’S NEW? NE W?
In SharePoint 2013 search scopes, federated locations, and best bets are now deprecated in favor of result sources, query rules, and result templates.
BENEFITS
SharePoint 2013 is light-years ahead of other search platforms in this area. Result sources, query rules, r ules, and result templates off remarkable control over search presentation. These are brand-new concepts, well worth learning — they arm site administrators and site collection administrators with the tools to eld powerful, effective search.
Working with Content WHAT’S WHAT’S NEW?
BENEFITS
Crawling is an area that has changed least with SharePoint 2013, but but there are still some great enhancements, including continuous crawling. cr awling.
With continuous crawling, users get fresher content faster.
Business Connectivity Services has continued to evolve and now supports claims tokens through the BDC.
Complex security scenarios are more tractable (though still hard).
The Content Processing and Linguistics capabilities capabilities in SharePoint SharePoint 2013 search are very strong and extensible.There’s lots of new capabilities including including a completely new le parsing mechanism.
This platform offers a lot of power to developers, as well as providing some key capabilities end users will notice.
Architecture, Deployment & Operations WHAT’S WHAT’S NEW? NE W?
Under the hood, there is a new architecture, a new search core, and many new modules that are the culmination of the FAST acquisition — not just combining the best of FAST and SharePoint search, but some signicant innovations from a continued investment in search.
BENEFITS
Search deployment and management is different, and largely better. better. Making search hu m for O365 — fully multi-tenant, smoothly scalable and fault-tolerant, and manageable at multiple levels — was a key goal for this release and there are big benets for on-premise deployments too.
Applications & Develo Development pment WHAT’S WHAT’S NEW? NE W?
BENEFITS
There’s a new development model for SharePoint 2013 generally, and for Search specically.
This makes extending search much more accessible, and will foster a lot of exciting search-based applications.
There’s a new Content Extensibility Web Service (CEWS) that opens up content processing for extension.
A lot of great possibilities are now open to developers.
Search is used pervasively perva sively throughout the SharePoint 2013 platform, and powers the new web content management (WCM) and e-discovery capabilities, ca pabilities, topic topic pages, the contentby-search web part, myTasks, mySiteView, and more — along with great ente rprise search, people search, and site search.
Your users will get more done and enjoy a var iety of applications, both both built in and tailored — all powered by search.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
4
TABLE OF CONTENTS
6
Introduction
SharePoint 2013 Search is Here
7
Chapter 1
User Experience — The New Face of Search in SharePoint 2013
8 10 12 14 16
Raising the Bar: The SharePoint 2013 User Experience First Class Search Interactions: More to Love The SharePoint 2013 Search Center Overview Reners and Faceted Navigation Search Center Setup
18 Chapter 2
Working with Queries and Results — New Mechanisms in SharePoint 2013
19 22 26
Query Processing: the Search Engine’s Automatic Transmission Query Rules and Query Suggestions Result Types and Result Templates
28 Chapter 3
Working with Content — Crawling, Connectors, and Content Processing
29 33 36
Content Capture Content Processing Linguistics Processing
40 Chapter 4
Architecture, Deployment, and Operations — Getting under the Hood
41 45 47 49 52 54 58
New Architecture, Single Search Engine Core Indexing and Partitions Analytics Federation and Result sources Search in Exchange Search Administration Upgrade and Migration
63 Chapter 5
Applications and Development — New Models for Search-Based Applications
64 69 71
The New Development Model in SharePoint 2013 The Content Enrichment Web Service (CEWS) Search-Based Applications in SharePoint 2013
77 Conclusion
INTRODUCTION
SharePoint 2013 Search is Here There’s a New Search in Town SharePoint 2013 has arrived, and it is chock full of new capabilities and features. This is a release with major architectural changes, built “for the next 15 years”, and it is very different from SharePoint 2010. With SharePoint 2013, the enterprise search capabilities are dramatically different and very exciting. Search has a new face, a new development model, and some remarkable built-in features. For search Jedis this new platform has a lot to love, it is:
• Clean, fast, and easy to use. • Straightforward to install, administer, and scale.
• Provides very powerful high-end search features.
• Makes creating search-based applications simpler than ever. For search Jedi apprentices, this release will change your world. Search is the “Force” used pervasively throughout SharePoint 2013 and has the power to transform the way your business uses SharePoint. What is intriguing about this release is that it’s very clear that Microsoft’s investment and innovation around search hasn’t stopped — it has accelerated. They’ve hit a key design target (easy, powerful search that runs on premise or in the cloud) right on the money. Since this release is a key architectural change for
SharePoint and a huge architectural change for search specically, there are also many new features to build on. Peeking under the hood, there is evidence that there’s more innovation to come in future releases — powerful new mechanisms which aren’t fully used yet. This isn’t a perfect release — there are some things that take getting used to, some areas that still need sanding, and some situations where you need to write code or turn to partners to boost the power of your search capabilities. We’ll point out some of these areas where you can turbocharge your search in this e-book. Search technology (and basically all software that does sophisticated things around human language) is extremely hard in general. High-end search is very powerful, and can be applied in a myriad of situations, so covering everything is at odds with making search easy. The approach of providing hooks for extensibility and encouraging partners and customers to use them works — and Microsoft has a great set of partners to pull this off. Search is still hard — don’t let the easy, simple user experience fool you into thinking otherwise. But Microsoft has done a remarkable job making this high-end technology accessible and easy for the mainstream. You will get enormous benet from this release, so get to know it.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
6
CHAPTER 1
User Experience – The New Face of Search in SharePoint 2013
7
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
Raising the Bar: The SharePoint 2013 User Experience User experience broadly characterizes the way that people, users, work through user interfaces and information and product-specic concepts to get work done. SharePoint’s users, broadly, can be pegged to two groups:
• Mobile and Tablet Deployment — Support for uid layouts, touch, and voice interaction mean that using SharePoint on Microsoft’s Surface tablet and the Apple iPad is much easier and smoother. This means that users can access information anywhere at anytime, with the same ease-of-use they’re familiar with from their desktop.
Business End Users — regular, line of business users who utilize SharePoint for specic tasks and projects. IT Users — IT professionals who manage, congure, and customize SharePoint for business users. For any new generation of a product, user experience goals are straightforward: make it easier for the user to get work done faster, cheaper, and better. A simple, intuitive, attractive design also helps. Consumers expect ease-of-use and a certain amount of slickness when it comes to interacting with products; the bar is high when it comes to how they can get work done. With SharePoint 2013, there are several developments surrounding user experience that business users can look forward to: • Modern UI/Windows 8 Look and Feel — The new look and feel confronts users with the most “radical” update in 20 years (UI news link below) to prepare for a multi-device world. This look and feel for the Windows operating system supports mobile and has the ability to boost productivity for an increasingly mobile work force.
• SharePoint 2013 and Applications — The bar is also going up when it comes to ease of access to information. SharePoint 2013 is able to eld experiences that are mobile and search driven, as well as for customer and employee only facing sites. There are a variety of full-edged applications that run on your desktop, in your browser, and on leading mobile devices and present new ways to access and interact with SharePoint information, further enhancing the user experience and productivity.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
8
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
Open for Designers If you’re familiar with SharePoint you know that you can customize your interface to make it look nearly any way you want to — but you also know that the vast majority of business users leave the look and feel as the default and never change it. With SharePoint 2013, you no longer use PowerPoint to create themes in a proprietary format. It’s easy to theme sites using HTML (including support for HTML5) — as shown below.This opens up SharePoint design to a much wider range of customization by designers, and will result in a lot of very attractive SharePoint sites.
The SharePoint 2013 user experience is a platform-wide update, ready for a new generation of interaction. Changes in the underlying presentation tier, service architecture, Object Model (OM), and Ofce Apps all further the goal of making it easier to congure and deploy valuable applications in this new delivery environment.
Mobile Challenges and Opportunities: Windows 8 and Metro Windows 8 devices have a new interaction ow.The desktop, charms, apps, and tiles are distinctly different from the familiar Windows 7 desktop. This represents an oppor tunity for application developers to create truly engaging user experiences that work across many devices. However, it also poses a challenge for developers to learn the Windows 8 stack, and a learning curve for users. Touch is highly intuitive and highly engaging; that said, the question will be how and when do users gain their rst experience and condence with idiomatic Windows 8. Will learning be amortized in context of your project, or someone else’s? Metro, as seen so far in SharePoint 2013, is a sparer, less dense way of presenting information, which is good from a user experience perspective. It also means there is less information displayed per page of results, and that decrease may trouble users who rely on “recall” over “precision” in their browsing and scanning. The solution to this problem may be to present information more effectively, to make the less more. In order to provide richer results, the design and consequent development of processing and enrichment processes will require new skills from the SharePoint application developer.
Deeper Dives TechNet on mobile devices and SharePoint 2013 »
Blog with highlights of Design Features in SharePoint 2013 »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
Article on Windows 8 UI » SharePoint 2013 UI blog »
9
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
First Class Search Interactions — More to Love SharePoint 2013 has revamped the user experience overall (not just for search), and offers nice user experience improvements for everyone. Highlights of the previous release, SharePoint 2010, included the roll out of the “ribbon” across all of Ofce and SharePoint, and the rst roll out of Ofce Web Applications. Search specic developments for the SharePoint 2013 platform for the end user include a atter, cleaner, and more responsive interface. The “atness” comes from a top down design that makes the transition of views in SharePoint Views (sites and document libraries), Search Views (search sites) and Detail Views (snippet and document) invisible. This improved responsiveness comes from the new architecture of the SharePoint 2013 presentation tier, which extensively uses modern HTML, JavaScript, and AJAX style interactions with responsive SharePoint search and metadata services.
• Transitions Across SharePoint Tasks — The disjunction between “contextual search” and “search sites” is gone in SharePoint 2013. There are fewer obvious differences between apps; this version of SharePoint does not feel stitched together like previous versions. New developments include the seamless ow between functions such as people search and search verticals. • Productivity — Search helps users quickly return to important sites and documents by remembering what they have previously searched and clicked.The results of previously searched and clicked items are displayed as query suggestions at the top of the results page. • Search Mechanisms Under the Hood — Queries, interpreting queries, returning relevant results, and the presentation of those results are pervasive across SharePoint 2013. It’s not always obvious that search is “there”, but
But that’s not all folks, there’s a lot more to appreciate about the new Search User Experience with SharePoint 2013 Search: • Document Previews — Ofce documents are rendered in the page for easy viewing, so there’s less interruption going from one view to the next. • Interactive Elements — Fly outs or hover card patterns are implemented quickly and cleanly. Search results y in and additional information about what you are looking for is available with a ick of the mouse.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
10
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
search technologies are used across the SharePoint 2013 platform, and key new interfaces lower the complexity of customization IT professionals and application developers need to do in order to support business users. Search powers a number of areas which may or may not be obvious as search: • Upgrades to People Search and Social Features — making it easy to explore and nd people, expertise, and conversations that are important to the task at hand. • New Social Features — My Sites, Communities, Teams, and Conversations create dynamic content that are quickly indexed via constant incremental crawls and returned through SharePoint 2013 search. • Personalization Features — search suggestions are personalized, and include visited documents, as described in the chapter on query r ules and query suggestions. These show up “as if by magic”, and many users enjoy them without thinking about search at all. Overall, the search interfaces are clearer and brighter, and all the different parts of SharePoint apps seem to work better together. It is also much easier to customize search-driven experiences in SharePoint 2013 than with any other enterprise search platform.
The New Face of Search Search in SharePoint 2013 has a completely different look and feel from previous versions; the UI has been largely rewritten. The new face of search in SharePoint 2013 is easy-to-use, clean, and intuitive — it offers easy exploration and navigation of information while presenting information in an actionable format. This is a far cry from the ten blue links concept that the industry has been living with for nearly 20 years. There are also a number of changes that have been made to enhance ease-of-access to information supporting both productivity and mobility. While the out of the box interface is clean, and we view this as a positive enhancement, it is not as information-dense as heavy search users demand. There are a number of search-based applications that can bridge user requirements surrounding information access and analysis and we will provide several next steps and options for review at the end of this e-book.
Deeper Dives Search User Interfaces book by Marti Hearst »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
11
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
The SharePoint 2013 Search Center — Overview The SharePoint 2013 Search Center has inherited the new look of SharePoint 2013 — overall it is clean, modern, and dynamic. As you can see from the screenshot below, it is quite different than what you are used to seeing. The familiar tabbed interface is apparent, but it has a more streamlined look and feel and includes some new out of the box tabs such as videos. There are also more actions that can be done directly from the search interface, including a hover panel. Some of the capabilities from FAST show up in this release as well — deep reners and document previews in par ticular.These have been taken to the next level with additional features such as the ability to show histograms for dates, and allow for a search inside the reners. While both capabilities are welcome, they are somewhat limited — whetting your appetite for more. *Note: The rener counts are turned off by default, but they appear with one click in the web part conguration panel.
Document Previews and the Hover Panel One of the most exciting new features added to SharePoint 2013 is the integration of document previews right within the search results. This feature leverages a new standalone server that hosts Ofce Web Applications. With Ofce Web Apps users can now open a document in a web client environment with reasonably high delity while preser ving format, fonts, sizing, etc. A key component within the document preview display is the “take a look inside” functionality. This provides the ability to jump specically to a relevant section of the document, based on extraction of sections for several document types. For example, because it is likely that the slide titles in a PowerPoint presentation were designed by the presenter to summarize the content of each slide, these titles are extracted and shown as links. This feature is also available for Word documents and Excel documents (focused on graphs and named tables) as well as SharePoint sites (top sub sites and document libraries). There are some limitations to the document preview features with SharePoint 2013. It is relatively slow and missing functionality that other preview products take for granted. This includes search term hit highlighting, the ability to immediately jump to the most relevant page of the document, as well as copy and paste functionality from within the preview. Breadth of content types is another area where SharePoint 2013 previews falls short — they are only available for content hosted in SharePoint, and only for a limited set of le
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
12
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
formats (for example, Word and PowerPoint, but not PDF).This preview technology was not designed for documents to be consumed via this interface, but rather to determine if this is the particular document that you have been looking for. Notwithstanding these limitations, though, document previews are a boon to the user and a great addition to search.
is available by default. The new hover panel provides a great way to show proles and content, in addition to social connections.
The hover panel paradigm works well in the Search Center. This can be customized and may vary based on content type or tab. Default actions with document preview include the Edit, Send, and View Library features, as well as Follow, a social feature. They also allow some actions directly from the search page, including editing content directly in Ofce Web Apps. For many applications, people will want to customize the search center, because it is not as information-dense as heavy search users or search-based applications demand. This type of customization is easy to do, and we’ll cover it later in the chapters about query rules, result sources, and development model
People Search People Search is another strong part of the Search Center. As with SharePoint 2010, people search lights up with actions when used together with Lync, and phonetic search
Overall, the SharePoint 2013 Search Center interface is better than any other search UI we’ve seen on the market. It appears to be very robust, and holds true to Microsoft’s ‘works anywhere’ commitment. It functions smoothly both in the cloud with Ofce 365 and on premise, as well as in all of major browsers (Internet Explorer, Mozilla, Chrome), and the
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
13
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
experience on tablets like the iPad is pretty good. A word to the wise: just don’t let a sexy demo or quick test drive lull you into thinking that ‘it just magically works’. As with all search products, the navigation depends on having decent metadata. Overall the out of the box interface is clean, fast, and provides relevant results — so the basic ‘must have’ elements of great search are covered. There are also a lot of exciting capabilities that make exploration easier, give users insight, and enable action directly from search.
of the products that are part of the search machine are Microsoft. — for example, People search lights up with actions when used with Lync; myTasks work with Project Server; and previews work only documents stored in SharePoint with recent Ofce formats, and require a separate OWA server. If you don’t have servers that run these other products, the additional features associated with them simply don’t show up. However, search still works very well even without them. When you have all these parts in place, though, they work extremely well together — a big accomplishment for Microsoft with strong productivity benets to the end user.
Of course, everything works better with search when all
Deeper Dives TechNet — creating a search center in SharePoint 2013 » Intro to the hover panel » Longitude Search Overview »
Reners and Faceted Navigation Less than ten years ago, the idea of using faceted metadata for exible search and navigation was just being hatched in an academic research project called the Flamenco project. Now it is de rigueur, it has proven to be effective and enterprise search without it is subpar. Microsoft added search renement in SharePoint 2010, with the reners populated by whatever content is in the associated managed properties. SharePoint 2010 created reners out of the top N results (called “shallow reners” — top 50 results was the default),
and FAST Search for SharePoint created “deep reners” out of the entire result set, even if it was millions of items. With SharePoint 2013, there are now two different modes for the rener web par t: standard search results, and faceted navigation. For standard search results, reners are generated as they were with FAST Search for SharePoint. You can now dene display templates to use for rendering different kinds of renements, which is a big win over SharePoint 2010. All reners are now deep reners. Faceted navigation is more dynamic. It is used in conjunction with term sets (served from the
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
14
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
term store), which are also used for navigation in document libraries. With faceted navigation a term from the term store lters what kind of data should display. If the managed proper ty is ‘renable’, the reners that show can depend on the term. This is handy in many search scenarios, including the online store scenario which inspired it. For example, users can use faceted navigation in an online store to nd products more easily. The scenario below uses the term store terms Camera and Laptop and managed properties Megapixel Count, Color, and Manufacturer. So, with faceted navigation your terms would look like this: • For the term Camera, add reners for Megapixel Count and Manufacturer • For the term Laptop add reners for Color and Manufacturer The reners that show up now are based on that term, which can be set based upon a page or catalog hierarchy, so that you get the following whether you navigate or search to laptops:
Conguring these reners via the term store is convenient, and there are built-in tools that make is easy to create a hierarchy, customize the reners within the hierarchy, and set up a very dynamic experience, as shown below.
Navigation and Search Unied Hierarchy is also used to create results pages, as par t of the WCM part of SharePoint 2013. Navigation settings are based on the same hierarchy, so that users can search, navigate, or rene their way to their result. Navigation controls also have built-in customization, as shown below.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
15
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
As you can see, Faceted Navigation is quite a powerful capability. Reners are available everywhere, they adjust dynamically and can be congured to an exact design — all controlled
by metadata. All reners used in Faceted Navigation are deep reners, so there are no gaps caused by a missed item in the deeper result set.
Deeper Dives TechNet Managed metadata overview » Technet — congure facted navigation in SharePoint 2013 »
Search Center Setup For the IT Professional, SharePoint 2013 offers more control over the logic of search applications, and it exposes that control in a clear, consistent, and logical model. We’ve outlined key concepts as they relate to Search Center Setup and how they are used to deliver search results. • Query Conguration — Query Rules are used to control ranking, query intent classications, synonyms, and query rewriting in SharePoint 2013.
• Faceted Navigation — Metadata used for top down navigation (“Faceted Navigation”) and metadata exposed as search results for bottom-up renement are now both managed through the term store.
On the Premise or In the Cloud? Get Going Faster Setting up a new search center is pretty straightforward. As illustrated below, site administrators can easily set up a SharePoint 2013 search center to run on premise or in the cloud.
• Presentation Conguration — Query Rules and Display Templates determine what result snippet gets shown for what class of query, what type of document, and for what category of user. The integration of search query processing across the platform means that display templates and query rules are applicable throughout the application.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
16
CHAPTER 1 THE NEW FACE OF SEARCH IN SHAREPOINT 2013
The Search Center itself is a site template, and the good news is that with this latest release some of the rough edges from SharePoint 2010 have been removed. For example, this template now inherits design elements from a master page, so you don’t need to jump through hoops to make it match your design. This does not mean that you don’t still need to think about how to manage the ‘universal search center’ — which may serve many site collections with different themes and designs — but you now have easier control.
Changes to Sites and Site Templates There are a number of changes to sites and site templates overall in SharePoint 2013. The facilities for sharing (requesting and granting site permissions) are completely revamped and considerably improved, as shown in the screenshot below.
in SharePoint 2010. Most Meeting Workspace site templates from in 2010 have also been discontinued in SharePoint 2013 — including the basic, blank, decision, and social meeting workspace templates and the multipage meeting template. They have been replaced by features from other parts of SharePoint and from OneNote and Lync, which all support collaborative work, live conferences, smaller meetings, note-taking, and storage of notes and other conference-generated commentary.The benet is that projects with multiple contributors and collaboration across geographically distributed teams is streamlined. The facilities for web content management (covered in the Applications section) are remarkably improved — and totally driven by search. This makes creating externally-facing sites and applications much more effective. If you have responsibility for explaining and exposing a service or product to a market inside your company, the business-focused features that are new in SharePoint 2013 are a strong proposition for inside the rm audiences. For example, if you provide consulting services internally for a legal practice area, recommendations, customization of search experience based on queries and personalized interaction, etc. enable users to nd relevant information more quickly.
The Document Workspace site template has been removed in SharePoint 2013, simplifying the list of Deeper Dives templates available when a new TechNet — creating a search center in SharePoint 2013 » site collection is created. This will Blog on using the Content by Search Web Part » be a big change to users since this template was a workhorse SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
17
CHAPTER 2
Working with Queries and Results – New Mechanisms in SharePoint 2013
18
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
Query Processing: The Search Engine’s Automatic Transmission The search experience involves many different processes, so creating a great search experience requires covering everything — from the moment information is pulled from the source systems to the moment it is presented to the user in search results. SharePoint historically had strong coverage on the crawling side via its Business Connectivity Service and Protocol Handler framework, and strong coverage on the presentation side via its XSLT driven core results web par ts. FAST Search for SharePoint on the 2010 platform then brought coverage of the content processing area via its pipeline extensibility framework as well as its built-in entity extractors. SharePoint 2013 completes the coverage by providing a strong query processing framework, shown below.
understanding the intent behind the query. You can leverage information such as: • Where the query originated from . For example, if you run a search from your company’s helpdesk intranet site, you are likely to be looking for FAQs, how tos, or IT specialists. The search engine can now capture that intent to provide more targeted results. • Who launched the query . If you are based in the United States, and searching for employee benets, you are more than likely looking for U.S. employee benets than for Canada or United Kingdom. • What concepts or entities can be recognized in the query . For example, if you were searching for an expense report form, the search engine will return the Excel spreadsheet, InfoPath form, or web page which enables you to le your expense report.
Query Processing in Action
But what does “Query processing” mean exactly? If you’re familiar with SharePoint 2010, think of query processing as the evolution of search scopes, federated locations, and best bets. With intranet search indexes now frequently reaching tens of millions of items, formulating the right query is more and more critical to nding relevant information. Fortunately, there are a number of techniques you can use to reformulate the query by
An example of query processing techniques combined would be a search for a weather forecast on Bing. The very rst result you’ll get at the top is the weather forecast for your location. Bing automatically understands the concept behind the quer y, and then correlates it with information about you, the user (in this case, your location) to provide you with the forecast. It is also worth noting that this answer is not displayed like the other results on screen. It is instead carefully rendered in a visual format to enable you to quickly make a decision based on that information.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
19
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
Query processing in SharePoint 2013 is intended exactly for these scenarios; to enable a smart, targeted search experience which understands what the user is searching for and to provide the optimal result straight from the search page. This is a very exciting new capability in SharePoint 2013, as it will open up many opportunities to rapidly build new applications driven by search which will look nothing like the standard list of ten blue links.
Getting in Gear: Result Sources, Query Rules, and Result Types So let’s dive now into the details of what SharePoint 2013 offers for quer y processing. We referred to query processing earlier as the evolution of search scopes and best bets. We meant it — literally! In SharePoint 2013, search scopes, federated locations and best bets are now deprecated in favor of result sources, query rules, and result blocks.
Result Sources enable you to focus searches and subset of the total information accessible in your organization by applying extra conditions to the search queries on behalf of the end-user. Stated as such, they sound very much like 2010 search
scopes. The key difference here is that the extra conditions enabled in 2013 go far and beyond what 2010 could do. SharePoint 2013 comes with a strong query builder to apply conditions based on the user, the search page URL (or any parameter found in it), the site, or the current date. Result sources can also be used to return results from remote content, much like federated locations in SharePoint 2010. (The result sources construct is covered in greater detail in the Federation chapter of this e-book).
Query Rules allow conditional transformation of queries and results based on custom logic. Imagine you want to simplify searching for budget spreadsheets in your organization. Using query rules, you can type simple search queries such as: budget spreadsheet project X and behind the scenes the request can be transformed into something much more elaborate . The query rule could recognize the terms budget and spreadsheet in the search query and rewrite the query so that the document content type must be ‘budget’, the le type Excel, and the le content match the project name you specied in the search keywords. Additionally, the results would be sorted from the most recently modied le so that the freshest information is returned rst. It is worth noting that the same Query builder functionality used for Result Sources is also available here as a means to dene conditions on query rules or transform user queries
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
20
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
The last major new feature introduced for query processing on SharePoint 2013 is the Result Type construct. A result type suppor ts the presentation of results in a tailored way, and the result block contains a small subset of results that are related in a specied manner. For instance, you can create several result blocks for sales collateral, knowledge base articles, documentation, etc. so that when a user searches for a specic product you can make sure to always return the top two or three pieces of sales collateral or knowledge base articles matching this query. In spite of the enhanced capabilities these tools provide, you may run into scenarios where they are not suitable or exible enough for a par ticular search scenario. For example, geo-searches (ranking or search results ltering based distance), personalized queries (complex query changes based on who executes the query), synonyms expansion, etc. are not supported. In these scenarios you can still rely on the Search API to build your own web part or search application that implements the appropriate logic. The API, is for the most part comparable with the version seen in SharePoint 2010 with a few exceptions. The main exceptions include the removal of the FulltextSqlQuery class and syntax which have been deprecated, and the appearance of the SearchExecutor class which allows you to execute multiple related queries in one shot.
to create pages as all the functionality is user friendly and has point and click interfaces. Microsoft made it even easier by pushing this functionality not only to site collection administrators, but to administrator s as well. That’s right, farm level privileges are not required — as long as you own a site (such as your personal site) you can use these capabilities to build your own search center. Two examples of applications you can build using these new features: • A manufacturing dashboard that displays all about a specic part based on its part number. Information could include the inventory level, the last orders for that part, the instructions on how to use that part, and forum discussions from your customers about that part. • A knowledge portal , that enables you to share FAQs, knowledge base articles, documentation, or tutorials to empower your support or helpdesk team. Powering your applications via search has never been easier. The chapter on Search-Based Applications has many more examples, and we encourage you to explore what’s possible, and even to try building some of your own.
Deeper Dives
No Speed Limits
Technet on query processing »
Microsoft has made it very easy to create search pages using this new functionality. In fact, you don’t need programing experience
Blog — overview of search in SharePoint 2013 » List of terms for query builder » New KQL syntax in SharePoint 2013 »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
21
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
Query Rules and Query Suggestions In the last chapter we introduced Query Processing and several new key concepts including Query Rules and Query Suggestions. Now, let’s go into greater detail on these and some other query features like spell check and rank management. Working with these features, you can customize search to a great degree, without writing code.
Query Rules Query Rules are a brand new feature in SharePoint 2013, and they are designed to enable you to act upon the intent of a query and provide a remarkable amount of control and congurability. The Query Rules framework is composed of three top level components: Query Conditions, Query Actions, and Publishing options. These are all congurable via PowerShell, or via the UI shown to the right.
Query Conditions are rule sets that are meant to determine the intent of the query (does the query meet a rule?) Options for this include: • Query contains a specic word or words • Query contains a word in a specic dictionary • Query contains an action word that matches a specic phrase or term set
• Query is common in a different source (like Videos result source) • Results include a common result type (like le type) • Advanced rules which can match across a set of terms, dictionary, regular expression, etc. If the query is against a particular result source (see the Result Source chapter in this book) or category, result source conditions can also be applied. If the Query Condition is met, Query Actions are then triggered.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
22
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
Query Actions specify a series of actions that take place once a query condition is met (what to do if the rule is met). These actions include: • Assign a promoted result — This replaces the “Best Bet” and a former FAST Search for SharePoint 2010 feature known as “Visual Best Bets”. The conguration of the promoted results allow you to specify if the returned action should be treated as a best bet (hyperlink) or as a fully formatted HTML block (Visual Best Bet) • Create and assign a results block — When a condition is met, one or more results blocks can be triggered. Result blocks specify an additional query to run and how to display results. This feature includes a full query designer so you can build and test queries before nalizing them. You can also include the results above those returned by core results, or interleaved by ranking. Additionally you can choose custom display templates instead of the default for the result or results block. • Change the ranked results by changing the query — This allows you to assign additional parameters and weighting (XRANK Boosts) values to the query (Query Transforms for those familiar with FAST). For example, if the condition of the rule is met, apply XRANK constant
boost of x number of points. XRANK is a FAST capability that allows you to override the default relevancy ranking by boosting the relevancy score for particular results at query time. • Publishing Options — Publishing options determine when a query rule is active (When to do this?) A rule may be active in a specic time interval (start date, end date) or always active (by default). You can also congure a review date (triggers an e-mail reminder to review this rule). The power of query rules is not only in the exibility they provide, but also the richness and complexity that can be derived from them. Imagine a single Query Condition being met, which then triggers a visual best bet, a results block from a remote SharePoint site, a results block from a cloud source, and a query transform that will boost results coming from the cloud. In addition, rules would determine that these actions are only taken between November 25th and December 26th. An example of how this would work in an intranet scenario, would be if you had a query rule that was active only during insurance open enrollment windows.
Query Suggestions Query suggestions enable users to ask better questions, and make it simpler to search for information. This feature was sorely lacking in SharePoint 2010. In SharePoint 2013, Query Suggestions are supercharged, thanks in part to the addition of the Analytics Processing Component and the Analytics Reporting
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
23
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
Database. These components provide for analytics aggregation and persistent storage of these analytics. Some key new features include: • My Queries — Personal Query Log (in Analytics database), which factors your personal SharePoint activity into the query suggestions. • My Sites — This capability tracks sites you have visited, and factors them into the query suggestions. • Our Terms — This feature uses information related to the most frequent queries across all users that “match” the search terms.
query to help them nd information, and to assist them in writing better queries. These suggestions are provided in two forms: 1 A list of items that others are typing for their queries. 2 A list of items you have clicked on before from your personal query log. A key aspect of this feature is that it will never provide a suggestion to a search that did not yield a click-through (someone clicking on the document), and it will never provide a suggestion if the results would lead to a dead end (zero-result query).
Query Suggestions now take two forms: Pre-Query Suggestions and Post-Query Suggestions. Both of these help the user ask better questions by showing you what others have asked before; they differ in when they are displayed and how people use them.
Pre Query Suggestions include both a list of queries from other users, and a list of items you have clicked on before, as shown in the screenshot below.
Pre-Query Suggestions occur prior to a query being executed. The goal of pre-query suggestions is to aid users in selecting a
Post Query Suggestions are provided after a query is executed and when results are displayed.These suggestions are based upon the results that you have clicked on at least twice.They provide a quick means to go back to a document that you regularly review or select. They are similar to the “Related Queries” provided with SharePoint 2010. Suggestions can also be tuned (inclusions and exclusions) within
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
24
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
the Service Application Admin Pages. It is also important to note that these are not tuned at the site collection level, but only at the SSA level.
Query Spell Correction Spell correction is a familiar and very useful feature, since humans are prone to misspelling and of course fat-ngering. SharePoint 2013 provides spell correction by default, as shown below:
In SharePoint 2010, spell correction was implemented as a series of XML les that dened inclusion and exclusion items for the dictionary. In SharePoint 2013, Query Spell Correction is managed from within the term store of the Managed Metadata Ser vice. Within the term store, Query Spellcheck Exclusions and Inclusions are nodes within the term store, as shown below. Dynamic dictionary creation is still suppor ted, but is now managed from within the term store.
Within the user interface for search, Query Spell Corrections can be congured to use “Did You Mean” type functionality for query transforms.
Working Wonders with Queries The mechanisms for query rules, query suggestion, and query spell checking are new with SharePoint 2013, and they may take some getting used to. Previously, there were some capabilities in SharePoint 2010 that processed queries such as the keyword features that applied to synonyms, best bets, and promotions/demotions that are now replaced by query rules. Once you become familiar with these new features, you will nd you can work wonders. In spite of all of the obvious pluses, there are some limitations with quer y r ules. You can’t call a program from a query rule, which blocks a variety of use cases. For example, synonym expansion is done on full queries, and on pre-built synonyms. This makes expansion easy to understand but has been a big annoyance to many search administrators in SharePoint 2010. This limitation can be addressed, but only through applications available through the Microsoft partner ecosystem — not via query rules. Calling applications based on query patterns (for example, pulling up an ATM location app when users search for ‘bank branch near me’) is feasible in SharePoint 2013, but not directly from query rules. However, these limitations are important only for a specialized set of search applications. The power of query rules, and query processing generally, in SharePoint 2013 is light-years ahead of other search platforms. Learn to use these mechanisms, and you will be in a great position to dazzle your business users with the power of search.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
25
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
Deeper Dives Good blog post on query rules » TechNet on query processing » List of terms for query builder » New KQL syntax in SharePoint 2013 »
Result Types and Result Templates There’s another new concept in SharePoint 2013 search, called Result types. Result types let you control how search results will be displayed, and let you display different content in different formats. For example, if you have e-mails, documents, and database records in the same result set, you may want to use different formats for each and display different managed proper ties for each. With SharePoint 2010, this meant creating complex xslt, and there was no easy way to group similar results together for presentation. With SharePoint 2013, wizards ease conguration of displayed results, and HTML and JavaScript enable you to add nishing touches if needed. The screenshot below has multiple result types, presented in result blocks. Videos, documents, personal recommendations, and a “visual best bet” (though it’s no longer called that) all have their own presentation and their own result template.
Results Framework Redux The Results framework is composed of three parts (as shown below): • Rules Engine — A list of rules to determine if the result type should be triggered. • Property List — Associates the rule to document type, content type, or other managed property within SharePoint search. • Rendering Template — Denes how that particular result will be displayed.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
26
CHAPTER 2 NEW MECHANISMS IN SHAREPOINT 2013
managed property before it can be used in a rendering template.
Result Types Unleashed The power of Result Types really becomes evident when looking at a real-world scenario. In the scenario below you have multiple documents that have been assigned content types (i.e. specication documents, data sheets, etc.)
Within Result Types you can: 1 Specify a rule based upon specic criteria. The rules can contain fairly advanced features, such as BOOLEAN logic (i.e. AND OR NOT), equality (i.e. = or !=), or comparison ( < OR > ). These rules can also be applied to managed properties. For example the rule might be ContentType= “spec documents”). 2 Specify which managed properties you would like to have returned once rule conditions have been met. You must specify at least one
3 Specify where you would like the requested property list items to be displayed using a tagging convention as follows (-#= contenttype =#-) by using a Rendering template. The Rendering template consists of a template that is composed of HTML and might contain JavaScript. Within this simple to edit template (Not like editing XSLT in SharePoint 2010) you can call specic graphics (icons, etc.) and be stylize it in any way that you would normally stylize HTML. Result types may seem complex to master, but once you become familiar with them you will appreciate how powerful they are. There are impressive tools in SharePoint 2013 that facilitate ease of use, and formatting is done using any tool you are familiar with. (SharePoint Designer has dropped the ability to do this kind of formatting, which will be annoying to some, but there are lots of great tools available to work with HTML and JavaScript.) With SharePoint 2010, very few people actually did the kind of formatting and result templating that was possible — it was too complex and arcane to use. With SharePoint 2013, you will quickly nd that result types and result templates are enjoyable to work with, and you’ll discover that you use them naturally to make search results look great and work well for users.
Deeper Dives Customizing search results via Result Types and Display Templates » Technet — query variables »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
27
CHAPTER 3
Working with Content — Crawling, Connectors, and Content Processing
28
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
Content Capture Capturing content is fundamental to search — if it’s not crawled and indexed, you can’t nd it! The process of connecting to content sources, crawling them to get content, and making that content searchable is far more complex than most people realize. It was also one of the most frustrating areas to manage with SharePoint 2010. As a quick orientation, the basic function of a crawler is shown in the gure below. The concept is simple enough: the crawler connects securely to a given content source, maps the content from the source system to the crawled properties of the search engine, and feeds the engine in either a full crawl or an incremental crawl (which nds any changes). What makes content capture different from one search engine to the next is the breadth of connector s, coverage of different security models, and data types, the performance (both throughput and latency), the robustness, and the ease of administration. SharePoint 2013 does well on all counts — although most connectors are supplied by Microsoft’s partners, not Microsoft.
SharePoint 2013 supports multiple crawl components, crawl databases, and content sources as shown below. There are a number of connectors included out of the box: • SharePoint • HTTP (web crawler) • File Share • Business Data Connectivity (BDC) Framework — also includes these connectors that are built on the BDC framework:
– Exchange Public Folders – Lotus Notes – Documentum Connector – Taxonomy Connector (connects to MMS) • People Prole Connector
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
29
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
Connector and Crawling Changes For the most part, these connectors are essentially the same as the connectors in SharePoint 2010. The connector and crawler infrastructure are the part of SharePoint 2013 taken most directly from SharePoint search, so they have the fewest changes. While few, there are still some notable changes. The web crawler has some nice updates that address previous headaches. These changes include:
Anonymous Crawl for HTTP Anonymous authentication allows any user on a web site to access any public content without providing a user name and password challenge. SharePoint 2013 allows you to get at these web sites without associating crawl to a user account. This is handy for general web crawling and makes the setup of web crawls simpler. SharePoint 2010 used the spsearch account to log into sites, which stymied many people trying to crawl SharePoint sites with anonymous access, public web sites, and the like. Previously there were work arounds, but they were painful. The updated functionality now offers a pain free way to perform this task.
Overall, the most noticeable change in content capture is Continuous Crawling. This is a new method of insuring you have the most current data in your search index, and is available only for SharePoint content. Rather than living with a latency of several minutes and with full crawls that might take many minutes to start populating content in the index, you’ll see content within seconds! When you enable continuous crawls (using the UI shown below), a crawl schedule no longer applies — you are running crawls in parallel and the crawler gets changes from SharePoint sites every N minutes (set to 15 minutes by default but this parameter is changeable). Continuous crawls do not stop for errors, but rather note the error and continue to crawl content. Continuous crawls can occur while other crawls (full or incremental) are active or starting, where incremental crawls need to wait for other incremental crawls to complete prior to starting to crawl. With this capability you can now keep content fresh, and won’t experience mysterious delays when additional content sources are added.
Asynchronous Web Part Crawl A common way to improve performance of SharePoint sites has been to load web parts asynchronously, which dramatically speeds up the rst display of the page. However, crawling these pages for search also delivered incomplete information. In SharePoint 2013 search, the crawler now gets a full rendering of the page in order to index them. This doesn’t work for all asynchronous pages, just for most out of the box web part content. But it takes care of the vast majority of problems in this area.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
30
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
The Taxonomy connector is new in SharePoint 2013, and you will see it at work even when you don’t use it explicitly, since the term store is much more integrated with search. As you will read in other chapters of this book, you can now create entity extractors directly from term sets, set up WCM page hierarchies using the term store, dene faceted navigation using taxonomies, and much more.
Building New Connectors When you start getting into search, you quickly nd that you want to get at more and more kinds of content from more and more places. Data silos are everywhere, and search lets you bridge these silos easily and securely. In order to do this, you need a connector for each content source — and many organizations have dozens of systems that require connectors well beyond what comes out-of-the box w. Luckily there are two options: leverage a rich set of partner-built connectors or (if you are a developer), create new ones yourself. SharePoint 2013 will still support existing protocol handlers (which are custom interfaces often written in unmanaged C++ code), using an interface used since MOSS 2003 and deprecated since SharePoint 2010. These can
still be good for high performance or particular tasks. But the primary way to create connectors is through the BDC Framework, which was introduced in SharePoint 2010 as part of Business Connectivity Services (BCS). BCS is an umbrella term for a set of technologies that brings data from external systems into SharePoint Server 2013 and Ofce 2013 (shown in the gure below). As with SharePoint 2010, you can make new connectors pretty simply. For systems with static schemas, straightforward security, and moderate performance needs, this is not a huge job. There are some great improvements in Business Connectivity Services as a whole — for example, there’s tooling specically to create External Content Types against OData sources, there are Representational State Transfer (REST) and Client Side Object Model (CSOM) interfaces, and External Content Types that can be scoped to a single SharePoint app. Unfortunately, none of these apply to search — creating an indexing connector for search is not the same as creating an External Content Type. The Business Data Connectivity (BDC) framework is largely the same in SharePoint 2013 as it was in SharePoint 2010, when it comes to search. There is one notable change though — Claims tokens are supported through the BDC. Previously, only Active Directory (AD)-format Access Control Lists (ACLs) were suppor ted, which made it nearly impossible to cover some complex security scenarios. With Claims suppor t, many of these scenarios are tractable — though still very much the domain of experts.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
31
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
One warning — you shouldn’t underestimate the effort involved in connector development, deployment, and maintenance. Don’t fear connector development, but watch out for the classic “quicksand” trap. Too often a development project gets to basic connectivity quickly but then struggles to get security right and to get high performance and scale. If and when this is successful, the project is then dragged further down in troubleshooting and maintenance, since things change every time the source system changes. Plan your development carefully to avoid this trap.The best way to avoid it is to consider pre-built connectors for any complex system — that way you don’t have to build your own from scratch, and you don’t have to maintain it.
Changes from FAST Search for SharePoint If you are used to FAST, there are a number of changes you will notice. These changes are all a byproduct of moving to a common, single search engine. First, there is no way to ‘push’ content to index in SharePoint 2013. (With FAST, there was a mechanism called the Content API). There are also three connectors that you will notice are gone: • Lotus Notes which had performance, security, and exibility features beyond the Notes connector included with SharePoint 2010 and 2013. • The Enterprise Web Crawler which rendered dynamic sites, had high performance, and several high-end features.
• Java Database Connectivity (JDBC) connector which supported direct SQL access to databases. Though these may seem like big gaps, there are ways to cover this functionality in SharePoint 2013, either with different mechanisms (many cases covered by the JDBC connector can be done via the BDC), or with pre-built connectors from Microsoft Partners.
From Crawl to Index Many of the most signicant content capture changes you’ll see with SharePoint 2013 search don’t actually result from the connector and crawling components. For example, the content processing component adds some remarkable capabilities that show up to the end user looking like better content.The Indexer has lower latency and is much more robust, which is one key to continuous crawling and also alleviates many of the weird issues people encountered with crawling after outage events with SharePoint 2010 (which could cause the crawler and index to be out of sync.) Additionally, improvements in schema management make mapping content much simpler with SharePoint 2013. All of these areas are covered in other chapters of this book — but they contribute to the improvements discussed above to provide robust, scalable, and high performance content capture. This is a great foundation to build on for any search deployment or search-based application.
Deeper Dives: TechNet on managing continuous crawls » MSDN on searching new content with SharePoint 2013 » Longitude Connectors Overview »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
32
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
Content Processing Content Processing is an essential pillar of search quality, but it is typically invisible to the end user. The development of content processing in SharePoint 2013 is focused on implementing platform-wide capabilities, and integrating and supporting built-in search-based applications such as WCM and e-discovery. In order to support the wide range of scenarios that depend on search, Microsoft provided extensibility, so that customers and par tners can leverage the new search platform and hook into content processing. The Content Processing component is brand new with SharePoint 2013. It takes content from the crawler and prepares it for indexing, as shown below. With SharePoint 2013, there is also a new Analytics Processing component that feeds information into Content Processing.
New Content Processing Subsystem — With a Heritage To understand the changes in the content processing structure within SharePoint 2013, it is useful to look at the heritage of this release, especially for the content processing component. Those familiar with the nal version of a “stand-alone” search engine offering from Microsoft, Fast Search for Internet Sites 2010 (aka FSIS), understand that FSIS was composed of three main structural components, as shown below. They were: • Core FAST search Engine (FAST ESP 5.3) — in red in the gure below — which was a complete search engine, to which new components were added on the content side and query side. • Content Transformation Services (CTS) which was responsible for content processing and ingestion and introduced the concept of processing ows. Flows are much more dynamic and expressive than the straight linear pipeline architecture found in Fast Search for SharePoint 2010. • Interaction Management Services (IMS) which managed all query and result processing, using processing ows.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
33
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
In SharePoint 2013, the underlying dataow engine for content processing, which was rst introduced as CTS, has been extended and enriched to host the content processing tasks for the entire SharePoint platform. Successful integration of a new content processing ow for search and enrichment for the whole SharePoint platform is a signicant investment and engineering achievement. The benets are potential scale out, improved management, “cloud ready” system architecture, and an improvement to Microsoft’s ability to integrate new content enrichment features inside the SharePoint platform.
New Capabilities for the IT Pro For the IT Professional concerned with how content is processed, enriched, and made ready for search, these SharePoint 2013 content processing feature areas stand out (we cover these in more depth in the chapter on Linguistics): • Linguistics features, in particular around phonetic search for per son names, continue to improve in scope depth. Cross-lingual name search (via People Search), for example, is a remarkable feature that makes it easy to nd people (since human names are notoriously hard to spell right). • Entity Extraction management , which was previously done via a set of separate les and ad hoc PowerShell scripting, is now moved into the Term Store — a big win because there is now a good UI and a robust set of tools with it.
• New format handlers implement document parsing. They replace IFilters for OOB document metadata. • Higher throughput for Ofce document types and for PDF. • Automatic content-based le format detection removes dependencies on le extensions. • Content processing throughput and error reporting (this is tied to crawl reporting) is comprehensive and far simpler to understand. Search analytics processing (which we cover in more depth in the chapter on Analytics) is an important new platform capability. The analytics module feeds information back into Content Processing for a variety of purposesfor example, to improve search relevance based on user behavior. Usage and search action events — document exposures and document click-throughs — are recorded into a new SharePoint 2013 analytics store. They are then processed in a form that enables search relevance to account for, for example, popular content, relevant query terms, or, in the context of recommendations, boosts for related user/ related item results. This also supports search history boosts.
Hooks for the Savvy Developer For developers familiar with the extensibility of FAST Search for SharePoint, SharePoint 2013 offers similar mechanisms. However, the content processing ow and search index are not as open as with previous FAST platforms — they are more of a streamlined and closed utility. You
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
34
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
will enjoy how easy it is to set up and operate these capabilities, and how little head-scratching you do in development — but you will be frustrated at how little you can get at. This is a sensible tradeoff in the context of a major platform upgrade and in accommodation of a hosted multi-tenant deployment model (O365). The capabilities and ability to extend them is still there, but it feels limited. There are times that it takes sophistication and inventiveness to do what you want with the hooks provided. The extension point for content processing is the Content Enrichment Web Service (CEWS). This is a new mechanism to enable content processing, called from a content processing ow at a single point, as shown below. We will cover CEWS in more depth in its own chapter, and touch on its applications in the chapter on Linguistics.
* Note the CEWS call-out is not part of O365 and is only available for the Enterprise Edition of SharePoint 2013. SharePoint’s management of content processing is highly scalable and streamlined. SP2013 content processing straddles the on-premise deployment of SharePoint and the deployment of SharePoint in hosted form via O365. If content enrichment beyond what is provided in SharePoint 2013 is important for your application, especially for content you already have, prepare to look for custom solutions that leverage the Content Enrichment Web Service.
Deeper Dives MSDN Section on Content Enrichment Web Service (CEWS) » TechNet content processing description »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
35
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
Linguistics Processing Linguistic processing, which aims to leverage the meaning of documents or words, is the ‘special sauce’ of search — and one of the most mysterious and difcult to understand areas. Human language is a tricky thing, and algorithms aimed at understanding it are complex and imperfect — yet this is what makes it seems like ‘search just works’ for end users. Linguistic tools, such as spellchecking of queries or grammatical normalizing of content or queries, can greatly simplify users’ search experience. Covering the wide variety of languages (SharePoint 2013 search covers 85 languages, from Afrikaans to Zulu) also means that you can nd content that is generated by users from across geographic boundaries.
Linguistic processing is applied to both content and queries (as shown above), using a similar framework under the hood. As mentioned in other chapters, the content processing and query processing components have a heritage from modules called CTS and IMS, and they share an underlying framework for processing ows.
In preparing content for indexing, linguistics are applied in stages, each one building on the previous one. The gure below gives an overview of these steps in what is often called the ‘pipeline’. (The steps in gray are not OOB, but illustrate some of what is possible by adding third-party components.)
First, les must be parsed, teasing the indexable text out of PowerPoint, OneNote, PDF, etc. During this process the language is detected, since processing English is different from processing Japanese. Words and patterns (dates, times, URLs, etc) are found, based on the text and language. Next, the ‘magic’ begins — a variety of types of Text Analytics technology is then applied. Stemming or lemmatization (which allows forms of the same base word to be matched, for example “sing”, “singing”, “sung”, or “incorporate” and “incorporating”), synonyms (matching, for example, “car” and “auto”), and concept detection of various forms deal with the wide variety of ways humans say essentially the same thing. Entity extraction, which is a key linguistic capability for SharePoint 2013 search, and techniques like categorization, relationship extraction, and sentiment analysis add metadata
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
36
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
that greatly improves the ability to nd and explore information. Microsoft has the deepest natural language processing development capability on ear th, because it has labs around the planet. This was strengthened with the FAST acquisition, since one of FAST’s specialties was linguistics applied to search. Strong language processing features show up in SharePoint 2013 search, which has continued a tradition of steady improvement in this area and has some extremely strong linguistic technology, including many improvements from SharePoint 2010. Some of the changes will be directly apparent to the end user, but many of them show up in subtle ways, and some are only relevant to specialists handling unusual situations. For those coming from SharePoint 2010 search, there’s some remarkable new capabilities and improvements. For those coming from a FAST based platform, the capabilities are familiar, but are now much easier to work with. There are some capabilities you are used to from FAST which are no longer there — we mention the major ones as we cover each area. There are some changes in SharePoint 2013 that will be noticeable to nearly all search deployments: document parsing is foremost, but also synonym management and custom entity extractors. Some changes will only be apparent or available to those extending search, and some will be visible only to a specialized group of deployments.
• Automatic le format detection no longer relies on le extensions, eliminating the kind of errors that happened when users or applications do creative things like making .memo les. • “Deep link extraction” works like a table of contents generator and allows you to click into previews for Word and PowerPoint formats. • Metadata extraction for titles, authors, and dates provides better metadata and is much easier to understand than the techniques used in SharePoint 2010 (where “Optimistic Title extraction” was one of the top sources of user confusion). • High-performance format handlers for HTML, DOCX, PPTX, TXT, Image, XML and PDF formats mean faster crawls and indexing. The new parsing facility is enabled by default and supports 55 of the most common le formats, including things like Montage, Visio and OneNote. By comparison, the 2010 Microsoft Filter Pack supported 15 formats, and the Advanced Filter Pack (available for FAST only) supported 422. For most deployments, this means you will no longer have to seek out third party IFilters — though the IFilter API is still supported and there is a rich assortment of IFilters on the market that cover le types beyond the OOB 55.
Changes in Document Parsing
Other Changes You’ll Notice
SharePoint 2013 introduces a completely new document parsing facility, with some big improvements. These changes include:
Language detection has changed with SharePoint 2013. In SharePoint 2010, language
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
37
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
detection was done ‘chunk wise’ on document parts like paragraphs. Now a much larger part of the document is used. The advantage of this is that language detection is generally better — the more language you can look at the more reliably you can tell what language it’s in. There is a downside to this approach, however — documents that have mixed languages — partly in English and partly in French, for example, aren’t handled as well. The Term Store (MMS) is well integrated with search now, which provides a number of big benets. Customizations to Query Spelling Correction are now managed in the term store — both inclusions and exclusions (shown below).
outside of the term store — Synonyms via a UI or PowerShell, Custom Extractors via PowerShell, and spell correction via a dynamic dictionary based on content in the index or a static OOB dictionar y.
Offensive Content Filtering was a feature that could be enabled in FAST Search for SharePoint. This feature, made it easy to shield users from obscenities and profane language that is found in content (even business content) remarkably often. However, it is no longer supported with SharePoint 2013, so you’ll need to nd a third-party alternative if this is important to you. Substring search, another FAST-only feature, was also removed. This provided n-gram matching without taking into consideration word boundaries, which was good for applications like part numbers.
Changes in Extensibility
Property Extraction (previously a FAST-only feature) is also manageable in the term store (shown below). However, only company names are available — if you were using property extraction for people names or place names, you’ll need to nd a third-party alternative. Some things are still managed
There are notable changes in how you can extend linguistics processing with SharePoint 2013. These include:
Custom Extractors (previously FAST only) are more powerful, and you can have more of them (12 rather than the ve allowed with FAST Search for SharePoint). These allow you to provide a list of terms (via PowerShell) and match them in the content, populating managed properties with consistent metadata which is the lifeblood of information discovery. Custom Word-Breaking now requires only one language-independent dictionar y, rather than the one-dictionary-per-language facility in SharePoint 2010.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
38
CHAPTER 3 CRAWLING, CONNECTORS, AND CON TENT PROCES SING
Customize stemming (done via registry settings in SharePoint 2010) is no longer supported. Third par ty specialists will nd ways to customize this level of linguistics and handle specialized cases. The biggest change is the availability of the Content Enrichment Web Service (CEWS). This provides a way to add linguistic processing of any type, such as the examples in gray in pipeline gure above (concept extraction, relationship extraction, geo-tagging, summarization, etc). With FAST Search for SharePoint, it was possible to extend the content processing pipeline through a sandboxed application, but this was both slow and limited in the information it could access. SharePoint 2013 introduces a much more open API which makes it possible to add specialized linguistics at lower levels as well as sophisticated text analytics. CEWS is covered in more depth in a separate chapter.
Putting Linguistics to work All of these cool capabilities come into their own when developing more specialized search based applications. This has become much more powerful with the application development hooks and tooling available, and you should expect to see some amazing applications built on SharePoint 2013 using these capabilities.
Deeper Dives Technet article on linguistic search features in SP 2013 » MSDN Section on Custom Word Breakers » Longitude AutoClassier Overview »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
39
CHAPTER 4
Architecture, Deployment, and Operations — Getting Under the Hood
40
CHAPTER 4 GETTING UNDER THE HOOD
New Architecture, Single Search Engine Core The rst and foremost change to search within SharePoint 2013 is there is only one search engine core. The idea that you would use the FAST engine for content and the SharePoint engine for people is completely eliminated in this release. There is now only one search engine within the SharePoint 2013 platform — which you can think of as bringing FAST to all search tiers. Powerful indexing, linguistics, extraction, and quer y expressiveness that are the heritage of FAST are now evident throughout the platform. To appreciate the evolution from SharePoint 2010, it’s worth looking at the history in this area. The acquisition of FAST Search and Transfer in 2008 was regarded by the industry as a major step forward in taking the lead in the enterprise search marketplace. The incorporation of FAST within the overall SharePoint 2010 architecture allowed organizations to leverage enterprise class search capabilities in a platform that was within the cost and budget requirements of today’s enterprises. Unfortunately, the acquisition occurred midway between release cycles. This forced Microsoft to determine which features would be available in the wave 14 release (SharePoint 2010), and which features would need to be included in the next release.
FAST Search for SharePoint is a ver y powerful product but there are numerous rough edges due primarily to the lack of time in the previous development cycle. The timeline also resulted in a hybrid architecture, with separate SharePoint and FAST farms, as shown below. This could be awkward and confusing to work with.
With the release of SharePoint 2013, the full realization of Microsoft’s investment in FAST Search and Transfer is now evident. The capabilities now available take enterprise search to a whole new level.They are the result of a new search architecture. The architecture, shown below, is relatively simple, though much of it is new.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
41
CHAPTER 4 GETTING UNDER THE HOOD
There is a good walkthrough of the components on TechNet, which we won’t repeat here. Each of the components shown are covered in at least one chapter of this book as well. However before we move forward, there are a few essential things to understand: • Search is fully integrated into SharePoint, and there is no longer a separate Search Server. Certainly, a SharePoint 2013 server or services farm can be used only for search. To do this, you do want to have the MMS (term store) and User Prole service, at minimum — much as you did in SharePoint 2010. • There are four different databases, each independent from the other. All of them can be partitioned, mirrored, and managed.The Crawl database scales with the amount of content crawled, so this is typically the database that has multiple instances in a large search deployment. • Every component can be scaled out for capacity and for fault tolerance. Previously, there could be only one Search administration component, which meant you had to do creative workarounds to create truly fault-tolerant congurations. • Search is now multitenant — except for a few things, such as the CEWS API. Much more administration can be done at the site collection (or tenant) level.
Not ‘just’ a Merger of FAST and SharePoint You can think of this architecture as bring FAST to every tier of SharePoint, but it is much more than that. This is not a mere merging of FAST and SharePoint — nearly every component in this architecture is new. Just as SharePoint 2013 is a major architectural release overall, search is in many ways a radical re-architecture. The computational platform underlying the search based interaction for SharePoint 2013 is a powerful distributed dataow engine (called NodeRunner). An illustration that underscores this is shown below. This is the same architecture, though not using the ofcial technology. The Crawl and OOB connectors (aka crawl component), are the least changed part of search in SharePoint 2013, and they retain the mssearch.exe name under the hood. The Content Processing Framework and Interaction Management Framework (aka Query Processing Component) are running ows, similar to CTS and IMS in the FAST Search for Internet Sites 2010 product (see the Content Processing chapter). These are running under NodeRunner. So is the search core — which is neither the FAST ESP core nor the SharePoint Search core.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
42
CHAPTER 4 GETTING UNDER THE HOOD
It’s a new, next-gen search core that was the result of a decade of research and development at FAST, hardened through the Microsoft development process.
server, each hosting one search component. On a default single server install there will be 5 instances of the NodeRunner.exe process, as shown to the left.
Also new in this architecture is the Analyzer (aka Analytics Processing Component), which we cover in the chapter on Analytics. The content processing component writes information about links and URLs to the link database. In turn, the analytics processing component writes information related to the relevance of these links and URLs to the search index via the content processing component. This enables some powerful capabilities like recommendations and usage-based relevance enhancement.
Although there is a fascinating dataow engine and a next-gen search core, those are not exposed for developers — the only points of conguration for interaction are ResultSources, QueryRules, and CEWS. In SharePoint 2013, conguration alternatives are circumscribed to assure that no conguration would result in excessive resource consumption for that instance relative to other instances that may be running through the same ser vice. So, QueryRules run effectively in a sandbox that restricts calls to non-SharePoint services.
If you look inside the search service, you will nd several search processes. This includes MSSearch.exe (for the crawl component), NodeRunner.exe (which hosts search components), and a Host Controller (a Windows Service that super vises NodeRunner processes. The Host Controller monitors NodeRunner processes, detects failures, and restarts processes if they do fail. There can be multiple NodeRunner instances on the same
Full Range of Search Topologies When people think of architecture, some think of deployment topologies — machines, nodes, and processes. There is lots of good material on this physical architecture, which we will not repeat here. But we’ll give you a avor. As with SharePoint 2010, the minimum conguration is just one node, and the minimum conguration with fault tolerance is two nodes (FAST Search for SharePoint required two and four nodes respectively). Scaling from there to ultra-scale search (including the scale of O365) is possible, and you can grow incrementally. The medium farm topology, shown below based on the TechNet recommendation at www.microsoft.com/en-us/download/details. aspx?id=30383, is capable of supporting approximately 40 million items in the index. Note
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
43
CHAPTER 4 GETTING UNDER THE HOOD
that we expect the density (items per node) of SharePoint 2013 search to go up dramatically over time, just as FAST Search for SharePoint density did.The initial focus has been on scale-out in order to support O365, not on density.
apply it to different applications, and develop on top of it. What will you notice about this architecture? There are many things beyond the capabilities that meet the eye. For example: • The core engine is different , so relevance is different. Since Microsoft has a lot of data with which to tune relevance , you’ll notice rst that the relevance is better OOB. But if you had customized relevance or spent time focused on it, you may have some work to do — or you may have a pleasant surprise.
What Matters is it Works As a developer or user you don’t really need to know about the underlying algorithms or dataow engine used in search. In fact, the search algorithms used by almost all search cores are a complex combination of linguistics and statistics, tuned heuristically. You can enjoy the result, and learn how to operate it well,
• Indexing is atomic in the new search core. That has some very interesting implications, but mostly you’ll notice that it’s more robust and that you can do ‘normal’ backup and restore. For nearly all search engines it’s a dirty secret that data can occasionally get lost in indexing (so one in a million items may go missing), and an outage can result in needing a full re-index — but this core will be different. • Scale-out is possible on a huge scale — big enough to run O365, and big enough for any challenge you can throw at it. FAST was always great at large scale, but this is
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
44
CHAPTER 4 GETTING UNDER THE HOOD
a different level; there should be less black art to building out big or high throughput systems. Ultimately, what matters is that it works. Other than the dogfooding done at Microsoft (which is pretty big), there isn’t much production experience with SharePoint 2013 yet, but every indication is that this is an architecture that is extremely solid — for both SharePoint generally and search specically.
Deeper Dives SharePoint 2013 — Search Logical Architecture » Technet Search technical diagrams » TechNet on Planning for SharePoint 2013 »
Indexing and Partitions In SharePoint 2013, there is a brand new search indexing core that is optimized for high volume throughout and overall scalability. The index component is the core of search; it accepts and administers both content and queries. Content data is indexed and stored in index partitions while the index component simultaneously handles queries and generates results. Like many other features of SharePoint 2013, the Index Component and related architecture resembles FAST, with the ability to separate indexes into partitions for query loads and data volumes alike. This is a signicant improvement over SharePoint 2010. The index is completely contained in these partitions and stored in the le system, without requiring a separate dip into SQL for metadata or for security entitlements — another huge improvement over SharePoint 2010, where the merge of results and security prevented deep renement and also could bring performance to a snail’s pace.
Index partitions are separate, which provide a lot of exibility. They can be stored individually on disk in a le set. Alternately, they can be further divided into discrete sections containing a unique index component.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
45
CHAPTER 4 GETTING UNDER THE HOOD
Microsoft has also developed a new nomenclature to describe the structure of the index. In FAST Search for SharePoint 2010, the structure of the index and conguration was described in terms of rows and columns. Adding columns increased the amount of content you can index and adding rows increased query volume throughput and redundancy. In SharePoint 2013 they have now adopted a Partition/Replica model to dene functions within the overall search index, as shown below. Partitions are logical divisions of the overall search index. The entire index is composed of the aggregation of all the primary replicas across the logical partitions. When content is sent to the indexing component, a transaction is generated to acknowledge receipt of the content. Each par tition then indexes the content from this transaction log. Secondary replicas are created as read only copies of the primary replica for scaling query volume of adding redundancy to the overall architecture.
one or more replicas of the index. The indexing component is responsible for managing and distributing the index across partitions. If an additional partition is added, the indexing component is responsible for the re-distribution of data across all the partitions. It is important to note that you can add additional partitions without re-indexing the data, but removal of a partition will force a complete re-indexing of all content.
A Simpler, More Robust Approach This new structure of the search index in SharePoint 2013 allows for a fully redundant, scalable means of indexing content. The fact that you are not copying index les from server to server and row to row means there is considerably lower latency to making search indexes replicated and available. This also signicantly reduces the server to server chatter that existed in previous versions. Each partition operates independently thereby increasing throughput and performance of the overall search sub-system. In a nutshell, the benets of this approach are: 1 Better indexing throughput 2 Less network chatter 3 Faster availability of the search index.
Within a partition, there is only one primary replica that is responsible for writing data in the partition. Each partition can be served by
As previously mentioned, the indexer is now atomic, which is a major breakthrough in search technology. Though the change is invisible to you, so you’ll notice that it’s more robust and that you can do ‘normal’ backup and restore. Indexing and partitioning are deep stuff, and this is a new core capability done well.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
46
CHAPTER 4 GETTING UNDER THE HOOD
Analytics Analytics are an often overlooked area, but have a crucial role in search — both in providing insight into user behavior and system operations, and in improving the user experience. SharePoint 2013 has a new analytics architecture, which merges web analytics (where people click and navigate) and search analytics (what people search for and what results they get). This is a great improvement over SharePoint 2010, where the web analytics service application was quite limited in both capability and scale. The result is called the Web Analytics Platform, which has been completely redesigned and integrated into the search service application of SharePoint 2013. The analytics architecture consists of the analytics processing component, analytics reporting database and link database (as shown below). The analytics processing component analyzes crawled items (search analytics) and how users interact with search results (usage analytics). It uses the information to improve search relevance, and to create search repor ts, recommendations, and deep links.
The Analytics Processing Component extracts two kinds of information: • Search analytics information such as links, anchor text, information related to people, metadata, etc. from items that it receives via the content processing component and stores the information in the link database. • Usage analytics information such as the number of times an item is viewed, from the front-end via the event store. The analytics processing component analyzes both types of information. The results are then returned to the content processing component to be included in the search index. Results from usage analytics are also stored in the analytics reporting database for repor ting purposes. The analytics component updates the SharePoint search index at time intervals set via a timer job, so it is independent of the crawl schedule. This can be confusing if you are trying to understand why search relevance changed. There is an extension point for custom events, but the analytics processing and search index update data ows are sealed from enrichment updates outside the SharePoint 2013 crawl. The results are most visible to the user as reports and recommendations. But there are several other ways that analytics shows up: • Search relevance is enhanced based on user behavior (views, click thru, etc.) • Popularity of content and of topics in discussion threads — which is driven from number of views as well as number of unique users to view — and can be viewed directly
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
47
CHAPTER 4 GETTING UNDER THE HOOD
• Popularity can also be used to create views through the Content by Search (CBS) Web Part Usage analytics in WCM are particularly important, since they provide essential insight into the effectiveness of your web site. These analytics are search driven, built to scale (scaling was a weakness in SharePoint 2010), and open for extension. A “Top Pages” web part is included by default. Some data like view counts are also pushed into the index so it can be included in search results, sorted on (i.e. what’s most viewed), etc. Personalized search queries and personal query suggestions in SharePoint 2013 are based on analytics data and usage information for each user. Recommendations (both item-to-item and popularity based) are available through this approach, as shown below. The “recommended for you” list is simply a precongured Content by Search web part — it looks like a static list but it’s generated dynamically by search. The addition of both the Link database and the Analytics Reporting database provide for a great deal more personalization, analysis, and relevancy within the engine. The Analytics reporting database has been added to keep track of all forms of analytics. Search Analytics analyze crawled items and how users interact with search results. These actions are stored
in the event store within the Web Front End (WFE) server and are regularly pushed to the analytics processing component where the actions are analyzed and reconciled. They are then pushed into the analytics reporting database and made available to the query and processing components. This allows for search to keep track of user actions, queries, and trends to provide the user with better search results and suggestions. This database now powers features such as personal and engine-wide query suggestions, favorites, and other search personalization components not found in any other enterprise search platform today. Within the analytics system, there are ve parts: • Event: Each item comes into the system as an event with certain parameters • Filtering and Normalization : Each event is looked at for special handling, normalization, and ltering; some are ltered out • Custom Events: You can congure up to 12 custom events in addition to what comes OOB • Calculation: Sum or average across events • Reports: A number of default reports are available, including top queries, most popular documents in a library or site, and historic usage of an item (view counts)
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
48
CHAPTER 4 GETTING UNDER THE HOOD
The gure below shows an overview of the data ow for usage analytics, usage events, and recommendations.
provided by the engine, as well as improving the quality of queries the user issues.
Deeper Dives TechNet overview of Analytics in SharePoint 2013 »
Federation and Result Sources
Note that 2010 web analytics aren’t supported running 14 mode, so running in 14 mode means running without any analytics.
Federation has been present in SharePoint since Microsoft released Search Server 2008 and Service Pack 2 for MOSS 2007. In a nutshell, this is the ability to query multiple search indexes on behalf of the user and to return all of these results together in a single view. Thanks to federation, users no longer have to use multiple search centers in order to search all content accessible within their organization. Instead they can go to a single search page and get all results available in one place.
Better Analytics Mean Better Search The quality of search results has direct correlation to the quality of the query and the volume of information that you provide to the search engine. In SharePoint 2013, the addition of the analytics reporting database signicantly increases the quality and quantity of information that is provided to the search engine. Knowledge about the person asking the question and the community asking the question greatly improves the quality of results
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
49
CHAPTER 4 GETTING UNDER THE HOOD
Federating or Indexing? Whenever someone is newly introduced to federation the immediate next questions that come up is: how does federation relate to indexing? Why should I continue to index remote systems if I can federate these? The truth is that indexing, if possible, is always better. If you index the content you can control relevancy, freshness, performance, faceted navigation, and ltering for the end-users (among other things). When you federate across search indices, you essentially relinquish control of these and become dependent on what the other system is capable of. With federation, your page will also be as slow as the slowest search engine queries and as relevant as the weakest sear engine queried. So federating results must be done carefully. Federation has proven very useful for scenarios where indexing may not be desirable or even feasible. For instance, your content is spread across multiple ofces with low bandwidth connection, making any remote crawling last for days. In such conditions, you would not be able to keep your index fresh enough for your end-user s. Another scenario is when you have so much content to index that it may not t within a farm. Imagine, for instance, a 50,000-employee company wanting to search across SharePoint and e-mails. Even at a low estimate of 10,000 items per mailbox (that’s roughly six months for an information
worker), this would represent over half a billion items to index! Finally, the remote source may not allow for crawling, technically or through license restrictions (imagine a secured deep-web content provider). In these cases federation is pretty much the only way to go.
Result Sources for Federation in SharePoint 2013 SharePoint 2013 offers improved federation capabilities via a functionality called Result Source. On top of the Open Search protocol already supported in MOSS 2007 and SharePoint 2010, you can now federate results from remote SharePoint farms via result sources. This allows SharePoint 2013 to better cover distributed organizations. A result source is quite easy to congure, as shown below.
While the options on SharePoint 2010 to provide organization-wide search were limited to a multi-search center or a published centralized search ser vice, SharePoint 2013 let you federate across farms. You can now have one farm per region or ofce location and federate results across farms using results sources. You can do the same between your intranet and extranet farms. While simple on the surface, this functionality lls a serious gap that existed in the overall
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
50
CHAPTER 4 GETTING UNDER THE HOOD
scalability of SharePoint 2010. In the marketplace, FAST and SharePoint were being criticized for not having a global systems architecture. The approach was to tell users to centrally index all content in a large central farm, if the latency allowed. For global organizations, this was often not feasible. There are limitations to the remote result source construct. It is limited to SharePoint 2013 and requires that all federated farms be upgraded to SharePoint 2013. Results are not interleaved, which is what users typically expect; rather, they are provided in result blocks. Reners are also not combined in any way. Overcoming these limitations is an exercise left to par tners. But despite these limitations, remote result sources are a major step forward and a great feature to use. Result sources also take over the function of scopes in SharePoint 2010. They are a more powerful tool than both scopes and federation, and are worth getting to know.
Exchange 2013 Result Source SharePoint 2013 also allows administrators to federate results between SharePoint and Exchange, providing a unied search experience where users can search both SharePoint content and their mailboxes through a single search center without having to index. Exchange remains in control of indexing the mailboxes and users can search across systems using federation with no additional hardware requirement. This is available because Exchange 2013 has the same underlying search core (see the Exchange Search chapter)
Security via oAuth SharePoint 2013 can also provide security trimmed results in a much more streamlined way. The Kerberos protocol is no longer a pre-requisite to providing security-trimmed results. Instead SharePoint 2013 offers strong security support through federation by leveraging the claim-based authentication mechanism built into SharePoint 2013 or by using the single-sign-on/secure store service. A trust must be established between the farms using a new method called oAuth which allows the passing of the current user’s claims to the remote farm when making the search request. This is similar in concept to establishing a trust between servers to consume service applications. oAuth is a new methodology replacing Kerberos shared authentication. When combining result sources and result blocks, administrators can offer their users a single list of results comprised of both local and remote results. The remote results are shown as result blocks (one per source) either above all results, or merged within the local results returned. Note however that faceted navigation and property ltering are still driven by local content only and do not reect any lters or facets available from the remote indexes.
Ofce 365 and SharePoint Online Ofce 365 has rendered organizations more agile by enabling them to consume SharePoint as a service without having to worry about capacity, backup, or maintenance. However it also created a new challenge as organizations migrating to the cloud were now facing siloed data with some content available online and
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
51
CHAPTER 4 GETTING UNDER THE HOOD
some content available within the organization network only. There was no single place to search both sets of content from. SharePoint SharePoint 2013 solves this scenario by enabling Remote SharePoint result sources to also support SharePoint online, therefore enabling scenarios where SharePoint online can federate with the on-premise search engine or vice versa. Result sources represent a key piece of technology to help organization migrate to SharePoint online.
The gure below shows an overview of the Exchange Search architecture in Exchange 2010. Full-text indexes are not stored in your Exchange databases. The search index data for a particular mailbox database is stored in a directory that resides in the same location as the database les. In Exchange 2013, the exsearch capability is replaced with a new search engine and index.
Deeper Dives Microsoft’ss comparison of indexing Microsoft’ indexin g vs. federating » federating » TechNet — conguring result sources »
Federation Use Cases » Federation vs. Indexing »
Search in Exchange Search in Exchange 2013 has been given a facelift. Pull back the cur tain, and it is the same new search core used with SharePoint 2013, optimized for large volumes of e-mail.
This provides a much more powerful, more effective search for exchange users — available through Outlook and Outlook web access alike.
To provide some comparison, Microsoft Exchange Server 2010 Search allows users to perform full-text searches across documents and attachments in messages that are stored in their mailboxes. Exchange Search (also known as full-text indexing) creates the initial index by crawling all messages in mailboxes within an Exchange 2010 database. As new messages arrive, arrive , Exchange 2010 Search updates the index based on notications from the Microsoft Exchange Information Store service.
Another signicant outcome of this change is that Exchange 2013 can appear as a result source to SharePoint 2013, as introduced in the chapter on Federation. This opens up a number of scenarios combining e-mail and other documents. In previous versions ver sions of SharePoint, SharePoint, you had the ability to connect to, and index exchange public folders but not personal inboxes. That remains the same with SharePoint 2013 (unless third party connectors are used), but now there is an ability to federate to exchange.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
52
CHAPTER 4 GETTING UNDER THE HOOD
The key concept to understand in regard to this functio functionality nality is that that each each system system handles handles the data resident within its silo (e-mail, tasks, contacts in Exchange 2013 and Documents and lists in SharePoint 2013). As discussed in the Federation chapter, there is some downside to this approac approach h — federation federation does does not not provide the same content processing, relevance, or performance as indexing. But this level of integration between SharePoint and Exchange is a wonderful feature that will help many users. You can get a single view across Exchange and SharePoint, as shown below. One of the new key features in SharePoint 2013 that relies heavily upon this tight tight integrati integration on between between SharePoint SharePoint 2013 and Exchange 2013 is the new Enterprise Content Management (ECM) stack and the associated e-Discovery components. From the e-Discovery perspective, per spective, the integratio integration n of ShareP SharePoint oint and and Exchange Exchange allow allow for in place preservation of information within SharePoint and Exchange. The e-Discovery e-Discover y console allows for dashboard view of integrated, enterprise-wide case management.
A Unied View is a Better View Since the rst r st release of SharePoint, there has always been a desire to be able to support searching your personal inbox to provide a more holistic view of your information. In
previous versions of SharePoint, there was support for indexing content from Microsoft Exchange, but only in public folders. With the release of SharePoint SharePoint 2013 and the fact that Exchange 2013 is using the same search infrastructure it is now possible to provide federated access to personal inbox results within SharePoint 2013. The primary benets of this approach are: 1 Exchange 2013 and SharePoint SharePoint 2013 leverage the same core search sub-system 2 Possible Possible to include federated federated personal inbox results from Exchange 2013 re-index all inbox inbox 3 Eliminates the need to re-index data within SharePoint 2013
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
53
CHAPTER 4 GETTING UNDER THE HOOD
Deeper Dives TechNet — What’s new in Exchange 2013 » Overview of eDiscovery and In-Place Holds (SharePoint 2013) »
Search Administration There tends to be a preconception that search requires no administration. administr ation. This is due in par t to the simplicity of the search interface and the general lack of awareness of how search works. But it is also due to people’s experience of web search, where they don’t have to do any upkeep. Little do they realize that Google.com has over 4,000 people administering search full time! Administering Enterprise search doesn’t take that much work, work, but but it does need to be someone’s job (even if not a full time job). There are two main levels of administr administration: ation: system administration (installation, conguration, topology management), and search administration (rules, best bets, looking for no-results searches).
Simpler Architecture Architecture Means Simpler Administration SharePoint 2013 search is simpler to administer on many levels than SharePoint SharePoint 2010 was. Part of this is that there is only one search engine core, and no hybrid architecture (see the One Search Core chapter). For FAST Search for SharePoint, you had to install two farms (a FAST farm and a SharePoint farm) and make them work together, together, including creating multiple search ser vice applications. There was extra work in installation, extra work in conguration, and extra extr a work in reconguring sites. There was also more troubleshooting because the architecture was more complex. There is now only one search core, only one installation, and only one search service ser vice application. There is a much simpler architecture, as shown below. As a result SharePoint 2013 is much simpler to install, congure, and troubleshoot as a result.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
54
CHAPTER 4 GETTING UNDER THE HOOD
Multiple Administration Components Components As mentioned in a previous chapter, the Search Administration Component is now fault tolerant, a big advantage for SharePoint SharePoint 2013. The administration database now contains only conguration and log information (it also held security entitlements in SharePoint 2010). There are new tools to export and import conguration information, including PowerShell PowerShell commands, so there are some ver y cool things you can do in conguration management.
As a new thing in SharePoint 2013, you now have site collection level search administration too. It’s pretty similar to central administration, naturally with a few limitations. Site collection administrators can set up and manage App catalogs, do term store management and User Prole Management, as shown in the screenshot below. Site collection administrators also have the power to manage some search settings in their site collections — a huge step forward.
Administration at Multiple Levels Levels Central Administration is still your friend with SharePoint 2013, and still the place where you create search ser vice applications. You will nd some new services applications on SharePoint 2013 (such as the Machine Translation Ser vice) and improvements on existing ones. But many of the operations oper ations will be familiar.The screenshot below shows an example list of service applications.
Search Administration Administration at this level is pretty comprehensive, as you can see just by looking at the search administration screen below (note that this is from Ofce 365, where it’s called tenant administration)
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
55
It is natural that this level of administration was introduced in SharePoint 2013 because of the emphasis on running multitenant in the cloud. Site collection administrators can start crawls; create result sources, and much more. This includes creating managed properties, which could only be done via central administration in SharePoint 2010, despite the fact that site collection or site administrators typically understand their content and crawled properties much better than central IT.
sources in order to give powerful search options to their end users. Query Rules and Result Types can be managed down to the site level.These have a wizard for conguration (for example , the query builder interface) with a built-in preview of what the results look like. Result Sources are easy to manage, as shown below.
Site administrators also have much more power with SharePoint 2013. They cannot create managed proper ties, but they have signicant control over search — which applies to their sites. The table below shows some examples of what Site Collection and Site Administrators can do.
Administering the New Mechanisms
There are very signicant improvements in Analytics, resulting from the new Analytics module. There are also better crawl reports, and process reports (see below). Since the Host Controller (described in the One Search Core chapter) is monitoring all NodeRunner processes, it can give the administrator a lot of insight into the system operations.
In other chapters we described new mechanisms like Query Rules, Result Types, and Result Sources. These are very powerful for the administrator. A search service application administrator can create result sources, and the site collection administrator’s site owner and site designers can also create and congure result SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
56
CHAPTER 4 GETTING UNDER THE HOOD
PowerShell like in SharePoint 2010, but in 2013 site collection administrators now have the ability to call a specic ranking model dened by the SSA admin from within query components at the site level. This means that site collection administrators can do much more with relevance control and ranking, choosing from a library created by the central administrator.
PowerShell is Your Friend PowerShell support was added to SharePoint 2010 and many administrators fell in love with it — for good reason. There are even more PowerShell options in SharePoint 2013. This includes more PowerShell commands for search: general search administration, crawling, search ser vice application, querying, metadata, and topology. In SharePoint 2013, PowerShell can now manage content sources and crawlers, not just repor t status. There are new options for creating a new search topology based on an XML conguration le, along with export and impor t commands. This means you will be able to create the same search topology in your staging environment, in your test environment, development environment and production environment. This can be ver y useful for performance testing, custom development, creating standardized congurations, etc. Ranking models are still congured via
PowerShell is available at all levels: central, site collection, and site administration, which gives you much more power. For example, we can create a PowerShell script for conguring all our search settings from the ver y beginning, from creating a search ser vice application, modifying its settings, creating the content sources, etc. PowerShell can also retrieve, create, or modify query results. In addition PowerShell can get keywords, modify ranking models, and more. If you haven’t learned PowerShell already, you will denitely want to learn it now.
Big Advances in Search Administration Search administration is still a complex task in SharePoint 2013, but Microsoft made the job much easier in this new version. The new single search core provides the power of FAST with a much simpler conguration than FAST Search for SharePoint. Search topology administration is still complex, but the topology can be much bigger and much more powerful.There are
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
57
CHAPTER 4 GETTING UNDER THE HOOD
improvements on administration from all sides: crawling, content processing, query processing, analytics, and user experience. This is a search that administrators can learn to love.
Deeper Dives TechNet — Index of Windows PowerShell cmdlets for SharePoint 2013 » Technet — search topology in SharePoint Server 2013 » SharePoint 2013 Developer Dashboard » TechNet — Manage the search schema in SharePoint 2013 » TechNet —View search diagnostics in SharePoint Server 2013 »
Upgrade and Migration You will love the capabilities of SharePoint 2013, and you probably own them already. But how much pain is it to move to them? Many organizations endured a very painful move from SharePoint 2003 to SharePoint 2007 and are still war y, despite a generally smooth move from SharePoint 2007 to SharePoint 2010. The good news is that this release has put a lot of focus on upgrades and there is a lot of good material. In order to move customers on O365 to the new release, Microsoft had to develop techniques for doing this more smoothly than ever before. The bad news is that upgrades are still tricky, especially for large and highly customized SharePoint farms. Even though the upgrade itself is fairly straightforward, there are usually lots of factors besides the software itself — the hardware necessary to handle an upgrade (there are no in-place upgrades to the new version), the user awareness and education, and the work needed to take advantage of new features. There are techniques that can reduce the risk and pain of upgrades, especially for search. These include things like use of cross-version federation and ‘search-rst migration’. But let’s start with a look at the standard basic upgrade.
Database Attach Upgrade The only upgrade method for going from SharePoint 2010 to SharePoint 2013 is a Database Attach Upgrade. (In-place upgrades are now only for build-to-build changes). This works for both content databases and services databases.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
58
CHAPTER 4 GETTING UNDER THE HOOD
The search databases have changed signicantly with SharePoint 2013. The search administration database supports a database attach upgrade, but the search index databases do not. As with essentially all search engines, to do an upgrade you will need to recrawl your content. One very nice advantage with SharePoint 2013 is that you can use PowerShell to make this happen with much less effort. The Database Attach method does help a lot with search. Content sources, ser ver mappings, scheme, federated locations, scopes, best bets, and the like are all preserved and upgraded. As mentioned in the search administration chapter, there are tools for conguration import and export as well as PowerShell commands that can do very interesting things, including automate and tailor the upgrade process.
Deferred Site Collection Upgrade The visual upgrade available in SharePoint Server 2010 has been replaced by a deferred site collection upgrade in SharePoint 2013. This allows existing 2010 site collections to work unchanged in SharePoint 2013. No SharePoint 2010 installation is required; SharePoint 2013 has all of the required SharePoint 2010 les included. This process is much safer, because it is deeply backwards compatible. It is the default for all site collections upon a database upgrade, which then automatically are running in “14 mode” on SharePoint 2013 ser vers. There is a new facility for health checks along with the upgrade, and a cool capability to create Upgrade Evaluation Sites. Essentially, this makes a side by side copy of an existing site collection, and allows
users to preview an existing site in “15 mode”. Deferred site collection upgrade permits use of SharePoint 2010’s UI with fewer operational hassles, while retaining the master page, JScript, SPF, and CSS applications of SharePoint 2010. This is an expensive operation, so you probably don’t want to use it everywhere, but it is a great facility to allow for safe, well managed upgrades — both from the software perspective and the user perspective. With Search, an upgrade of search centers generates result templates that include the hover panel, and which have previews (when a separate Ofce Web Apps server or set of servers is available). Scopes are upgraded but can’t be changed — they are replaced by the new Result Templates, but the corresponding result templates aren’t generated automatically.
Working Across Versions: Search-First Migration Since search is greatly improved in SharePoint 2013, it may be worth considering a “search rst” upgrade. This lets you get the benet of the new features and capabilities, without needing to do ever ything at once. You can upgrade your content farms at any pace that works for you, while serving everything from SharePoint 2013 search. This pattern uses something called SharePoint 2013 Federated Ser vices. Only a few federated services suppor t this: Search, Prole, Social, Secure Store, Managed Metadata, and BCS. But this is everything you need to do a search rst migration.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
59
CHAPTER 4 GETTING UNDER THE HOOD
There are several steps to a search-rst migration, as shown below.
Sequentially the steps are as follows: 1 Deploy and congure a new SharePoint 2013 Services farm, including a search center. Migrate the search settings from the SharePoint 2010 farm. When the search-rst migration is complete, this farm provides search functionality to end-users who are still working in the SharePoint Server 2010 farm. 2 Crawl all content in the SharePoint Server 2010 farm by using a crawler (or multiple crawlers) in the SharePoint 2013 services farm. Continue to crawl this content regularly. 3 Congure the SharePoint Server 2010 farm to consume search from the 2013 services farm, using federated services. Some things will be best consumed by doing redirects (for example using a new search center with the new functionality can’t be done via federated services).
Hybrid Solutions When you talk about the upgrade of Search from SharePoint 2010 to SharePoint 2013 — there is the potential for some hybrid solutions using different versions of SharePoint or using cloud and on-premise SharePoint instances in the same company. Generally, hybrid means a combination of on-prem and cloud content in a single view. There are several ways to accomplish this, including indexing and federation — as mentioned in the Federation chapter.The gure below illustrates the various permutations of hybrid congurations.
The search-rst migration pattern opens the door for a much wider set of possibilities — hybrid solutions.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
60
CHAPTER 4 GETTING UNDER THE HOOD
Crawling and indexing content from the cloud (such as from O365) is a very solid way to create a unied view, and has the benet that indexing generally has: unied content processing, solid and consistent relevance and navigation, and consistent fast performance. Although this scenario is not supported by OOB connectors with SharePoint 2013, there are partner-built connectors that accommodate it. With SharePoint 2013, the remote result source construct means that a view can be created using federation, specically between O365 and on-prem SharePoint. There are limitations to the remote result source construct. It is limited to SharePoint 2013 and requires that all federated farms to be upgraded to SharePoint 2013. Results are not interleaved, which is what users typically expect; rather, they are provided in result blocks. And reners are not combined in any way. Overcoming these limitations is an exercise left to par tners. But despite these limitations, remote result sources are a major step forward and a great feature to use.
Cross-version Congurations How do these scenarios help with upgrade and migration? If you extend them to cross-version congurations, it becomes clear. Search-rst migration is an example of crawling on-prem content from on-prem search (the upper left scenario in the gure above), but across versions. By crawling SharePoint 2010 content from SharePoint 2013 search, you can provide an upgrade path that can be done a step at a time, maximizing the benet to users while minimizing initial effort.
The same idea applies to more general scenarios. When you have more than one SharePoint farm, you can handle cross-version scenarios. You can have a Search on SharePoint 2013, while you have content and other applications on SharePoint 2010 or even in SharePoint 2007. You can have SharePoint 2013 in the cloud with SharePoint 2010 on-prem. You can include other content in the cloud that should be crawled, such as Microsoft CRM online or SalesForce.com. With these techniques, it’s possible to eld a very broad with different versions and different options. This helps with many things, including migration. Federation applies well to cross-version scenarios. Although SharePoint 2013 only supports same-version remote result sources, it is feasible for partners to create federation across multiple versions, which appear as a result source. A conguration like the one shown below provides many benets. With respect to upgrades and migration, it means that legacy search systems can be left in place and federated into a SharePoint 2013 search center. While this is not as good as combining all content into a common index, it is a very useful technique that allows you to upgrade or migrate complex systems a piece at a time.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
61
CHAPTER 4 GETTING UNDER THE HOOD
Cross-version Hybrid Congurations A variant of this is support of cross-version hybrid congurations. Specically, you may wish to adopt SharePoint 2013 online via O365 to have different versions. You may not actually have a choice in the matter, since O365 will shift to SharePoint 2013 fairly quickly, faster than you may be ready to upgrade your on prem SharePoint farms. But the remote result source mechanism in SharePoint 2013 is very powerful, and has solved many of the toughest aspects of managing hybrid congurations with O365 — such as security and single sign-on. It is feasible (though not OOB) to apply this to a crossversion hybrid conguration as shown below.
Migration From Previous Search Versions The changes in SharePoint 2013 search are powerful and far reaching. Fielding a single new search core resolves a historical challenge with Microsoft search (a complex product lineup with many different versions). One tricky aspect of this change is that migrating from previous search versions depends on the avor of search you are migrating from.
The case of migrating from SharePoint 2010 search to SharePoint 2013 search is the best supported one. There are some gotchas in this migration as mentioned throughout this e-book, but the process is generally smooth and well covered by Microsoft. Going from SharePoint 2010 search to SharePoint 2013 is a step up in nearly every way, so there aren’t that many rough spots to consider. If you are migrating from FAST Search for SharePoint, many of the same tools and techniques apply, but there are more corner cases to consider and more feature changes to consider. If you are moving from FAST ESP or FAST Search for Internet Sites, there are signicantly more considerations. The migration patterns and techniques still apply, but you are more likely to have a heavily customized search deployment that uses special FAST features which have been supplanted by other mechanisms. There is help available however. Microsoft has a big ecosystem of partners and there are some that have specic focus, tools, and techniques for this kind of migration. You may not get direct suppor t from Microsoft, but you can tap into this ecosystem for help.
Summary — Options for Upgrade and Migration Upgrading to SharePoint 2013 can be seamless, and there are valuable tools and processes provided OOB. The only supported approach is a database attach upgrade, so you should expect to provide extra hardware resources for your upgrade. But the deferred site collection upgrade facility provides a safe approach to upgrades and lets you delegate the work for
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
62
CHAPTER 4 GETTING UNDER THE HOOD
each site collection to the appropriate owner if you like. Upgrading search is part of upgrading SharePoint, and the standard upgrade process from SharePoint 2010 to SharePoint 2013 covers search well. But search poses special challenges — the more complex and customized your search conguration is, the more challenging the upgrade will be. Search also offers solutions to many upgrade challenges for SharePoint as a whole. Since search bridges information silos, it can bridge across different farms, across different versions, and across on-prem and in the cloud instances. Techniques such as search-rst migration, crawling remote farms from SharePoint 2013, and use of federation are available — not OOB but through Microsoft’s ecosystem. A unied view across these different dimensions provides users a great experience while allowing you to upgrade or migrate one piece at a time.
Deeper Dives Services upgrade overview for SharePoint Server 2013 » SharePoint Online administration » BA Insight resources for Integrating O365 content » TechNet — SharePoint Server 2010 deprecated search features » TechNet — FAST Search Server 2010 for SharePoint deprecated features »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
63
CHAPTER 5
Applications and Development — New Models for Search-Based Applications
64
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
The New Development Model in SharePoint 2013 With a new development model for SharePoint 2013 and for search, the capability to extend search is much more accessible. We think this development will foster a lot of exciting searchbased applications. Development with SharePoint 2013 emphasizes standard web technologies such as JavaScript and HTML, client side programming and remote calls. There’s a focus on running applications in the cloud, and there are several options for extending the out-of-the-box capabilities of the product. There is also the option to build business solutions with no or minimal use of server-side code. JavaScript and modern HTML and CSS know-how are important for the UI designer and developer on SharePoint 2013. It should be easier for designers to use tools they are familiar with. Visual Studio 2012 offers strong tooling for both Ofce 2013 apps and SharePoint 2013 applications and solutions. A key goal the SharePoint 2013 for customization scenarios was to make developing applications for SharePoint much more like developing Facebook apps.
A New Programming Model The gure below gives a birds’ eye view of the changes between the SharePoint 2010 and SharePoint 2013 programming models. In SharePoint 2010, your custom code ran either server-side in SharePoint (as fully trusted code or in a sandboxed solution), or via a Client Side Object Model (CSOM). The SharePoint
2010 CSOM is a Windows Communication Foundation (WCF) service with three different proxies to enable Silverlight, JavaScript, and .Net managed clients to call into SharePoint remotely. With SharePoint 2013 the server side code runs off the SharePoint server farm via declarative hooks like apps, declarative workow and remote events which then communicate back to SharePoint using CSOM or REST.
There are lots of advantages to this model. Traditional SharePoint development was heavy lifting and had a steep learning curve; the new SharePoint 2013 model is much more manageable which will open up SharePoint to a much wider audience of developers. Serverside code can impact the performance of SharePoint, be complex to install and upgrade, and can’t be run on public cloud services. The CSOM in SharePoint 2013 is much more powerful — you can do almost everything the server side APIs did in SharePoint 2010. In addition, it suppor ts OData — now the
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
65
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
leading industry protocol for performing CRUD (Create, Read, Update and Delete) operations against data, as shown below. Depending on your deployment scenario, you can still use sandbox and farm solutions to push server side code to SharePoint 2013, however, Microsoft recommends that developers follow the new app model as the preferred way of building their custom applications for SharePoint 2013. The message is “don’t make any new sandbox solutions” and “build new farm solutions only if you absolutely have to”.
3 Azure Auto-Hosted App (which runs in an Azure instance which is invisibly provisioned by Ofce 365) Apps are simple and powerful, but they have a number of limitations, and there are still many cases where SharePoint solutions are called for instead. Anything that uses ser ver-side code, does farm-level work, has a high level of complexity, or has installation coupling or dependencies calls for a SharePoint solution rather than a SharePoint App.
What’s Special for Search
Apps for SharePoint There’s a new way of packaging and deploying code in SharePoint 2013 which is aimed at development of lightweight apps. Apps for SharePoint don’t live in SharePoint. They execute in the browser client or on a remote Web Server ; they’re granted permission into SharePoint sites via OAuth (a standard for providing delegated authorization to apps); they communicate over the new SharePoint 2013 CSOM APIs. There are three types of apps you can build for SharePoint 2013: 1 SharePoint-Hosted App (which runs within the browser) 2 Provider-Hosted App (which runs on another web server in the datacenter or cloud)
The SharePoint 2013 Search CSOM opens most (but not all) of the Query object model functionality for online, on-premises, and mobile development; the search results data is in JavaScript Object Notation (JSON). Queries suppor t two language syntaxes: KQL (Keyword Quer y Language) and FQL (Fast Quer y Language); SQL is no longer supported. In addition to the CSOM, there is a REST (Representational State Transfer) service, so you can remotely execute queries against the SharePoint 2013 Search service from client applications by using any technology that supports REST web requests. The Search REST service exposes two endpoints, query and suggest, and will support both GET and POST operations. Results are returned in either XML or JSON format. At one level, a search app is just another SharePoint app, and a search solution is
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
66
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
just another SharePoint solution. This is revolutionary enough: it means you can use search via a REST interface, include it in an Ofce App, and use it easily in combination with other par ts of SharePoint. But customizing search also means creating or customizing connectors using BCS or a protocol handler (see the content capture chapter), customizing linguistics using the Content Enrichment Web Service (CEWS) (see the content processing, linguistics, and CEWS chapters), working with other ser vice applications, and more. There are numerous search-specic web par ts, including the new Content by Search web part (shown below), which is a powerful “swiss-army knife” tool.
Search combines well with other parts of SharePoint — with content management, with workows, with BI, and with sites. It also can be used with several of the new service applications in SharePoint 2013. These include the Machine Translation Service that supports
automated language translation of les (think multilingual search), and the Work Management Service that provides task aggregation functionality. If you are doing quer y-side-only work, you might be able to use an app model. But for the most part, developing sophisticated searchbased applications will remain the domain of SharePoint solutions with SharePoint 2013. There are several things (connectors and pipeline extensibility) which are still per SSA and not per tenant.
Building Search-based Applications SharePoint 2013 is a great platform for building search-based applications. These r un a wide gamut, from conguring departmental centers using query rules and result blocks, through extending content processing to add domain-specic entity extraction, to creating brand new user experiences. They are often specic to role, industry, and topic — and they usually have a strong and measurable business value because the end users use them for specic purposes. Wherever there is an identiable group with a need to work with unstructured content (or a mix of structured and unstructured content) there’s a need for a search-based application. We discuss this more in the chapter on Search-Based Applications. Nothing is perfect, and there are still challenges with the search development model. Some of the limits of SharePoint 2013 include: • On the content side, there is no ‘push’ API for content, nor an ability to do partial updates. Continuous crawls
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
67
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
are limited to SharePoint content only. There’s no mechanism for getting external data indexed into O365. Developers that want to approximate these from the outside have to live with limited performance and build a very complex structures or use third party frameworks. • Many of the mechanisms inside search are sealed and can’t be extended. Update groups, query ows, analytics processing, web crawling are examples. It’s completely understandable that these be kept intact from meddling developers, and there are some of these that can be inuenced safely using partner products. But it’s frustrating to see these mechanisms and not be able to touch them. • The SharePoint App model and SharePoint Marketplace are aimed at lightweight, simple apps and not something you would use for a full business application today.
Developing with search is still hard. Intrinsically, areas like content processing and relevance are imperfect, since we’re dealing with human language and subjective opinions of the ‘right’ answer. There are no joins or aggregation internal to search so there are limits to combining structured and unstructured content. But SharePoint 2013 is far ahead of any search platform in terms of available capabilities, performance, ease, and safety of development. And there is a strong ecosystem with available building blocks and complementary capabilities to use in creating great applications with search.
Deeper Dives Book chapter on developing with search from Wrox “Professional SharePoint 2010 Development” » MSDN overview on developing with SharePoint 2013 » MSDN section on building search queries with SharePoint 2013 » SharePoint 2013 Developer Dashboard »
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
68
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
The Content Enrichment Web Service (CEWS) As mentioned in the Content Processing chapter, one of the biggest changes in SharePoint 2013 is the availability of the Content Enrichment Web Service (CEWS). This provides a way to add linguistic processing of any type, such as concept extraction, relationship extraction, geo-tagging, summarization, etc. With FAST Search for SharePoint, it was possible to extend the content processing pipeline through a sandboxed application, but this was both slow and limited in the information it could access. SharePoint 2013 introduces a much more open API which makes it possible to add specialized linguistics at lower levels as well as sophisticated text analytics. CEWS is the key extension point for content processing — in fact, the only extension point outside of changing the content or modifying it in a custom connector.There is no Content API in SharePoint 2013 for updating metadata into search index independent of a crawl. CEWS calls an external web service using SOAP via a proxy, as shown below.
Big Changes in Pipeline Extensibility CEWS replaces FAST Search for SharePoint’s pipeline extensibility stage, which had a number of shortcomings. With FAST Search for SharePoint, an executable was run within a sandbox near the end of the content processing pipeline — this was a major performance bottleneck and deployment headache. Some crawled proper ties were available, but derived properties were not. No properties could be modied — you could return things in new properties but only to a limited extent. And the executable was called for all content, so ltering logic was needed outside the pipeline, and the performance penalty of calling it was incurred for all content. With SharePoint 2013, things have changed dramatically. Using a web service callout opens up many options and removes some of the difculties in writing pipeline extension stages. The processing pipeline passes designated managed properties (including document text) to the remote service. There are hidden and read-only proper ties, but some managed properties (like Title) can be modied. The mechanism for CEWS is fair ly simple: • The content processing component sends a SOAP RPC call to a congurable endpoint over HTTP. • The payload contains an array of property objects. • The web service performs some custom logic on the array of property objects, and returns an array of modied or new property objects.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
69
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
• The web service must send a response to the web service client within a given timeout. • No specic authentication or encryption mechanisms are supported as part of the contract. You can, however, apply your own security on the transport mechanism. A trigger condition is registered in the ContentEnrichmentConguration object which allows control of when the content ow calls out to an external web ser vice. A set of PowerShell commandlets are provided to control the conguration, and there are robust error handling mechanisms built in.
The advantage of this is that you can provide custom linguistics even at a fair ly low level, and inuence other aspects of the pipeline. The control afforded by this is wonderful and will be exciting to those wanting to address specic linguistic processing at a low level. The disadvantage is that you can’t leverage the work done in the pipeline when you are doing external processing, as shown below. This not only means extra work as a developer, but introduces the potential that linguistic processing could get out of sync.
What to Look Out For For those familiar with pipeline extensibility, you will nd CEWS easy to use. However, there are a variety of limitations and gotchas to look out for. One key difference in CEWS from FAST Search for SharePoint is where it is called. Specically, you get content and managed properties after document parsing but before word breaking, as shown below.
The extensibility call outs are invoked synchronously, in line with the processing ow, so long-running enrichment tasks or batch-oriented processing tasks will require enrichment data ow management independent of and outside SharePoint 2013. Not all managed properties (or any crawled properties) are visible to the CEWS and less state (potentially useful for supplemental linguistics processing) is exposed than in FS4SP. Finally, the CEWS is visible as a single logical endpoint to the potentially many content processing ow instances in SharePoint 2013. There is only one ContentEnrichmentConguration object active, and only one trigger, etc. This means
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
70
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
that throughput management, and support for multiple enrichment stages (more than once instance of taxonomy classiers or custom entity extractors) need to be managed externally, which will pose some interoperability challenges if you are interested in doing multiple types of content enrichment. * Note: The CEWS call out is not part of O365 and is only available for the Enterprise Edition of SharePoint 2013. CEWS is a new mechanism in SharePoint 2013. It has many nice aspects — it is a more standard, higher performance mechanism than that available in the past. It also provides the ability to modify some managed properties, making it possible to address use cases that were nearly impossible with FAST Search for SharePoint. CEWS also has limitations, and using it will require special attention by developers. But all mechanisms have limitations. Overall, Microsoft has provided a strong and essential extensibility mechanism that lets you do magic things with content processing and linguistics.
Deeper Dives MSDN Section on Content Enrichment Web Service (CEWS) »
Search-Based Applications with SharePoint 2013 SharePoint 2013 is designed to support applications. Many parts of SharePoint operate out-of-the-box as applications (formerly called ‘workloads’, although this term doesn’t seem to be used much with the new release). In addition to a new development model (covered in the previous chapter), a new App model and App marketplace, and an emphasis on running applications in the cloud, there are many capabilities to leverage in building new applications. Mobile applications, which played poorly with SharePoint 2010, are fully supported now. SharePoint is, more than ever, an application platform with a set of prebuilt applications and apps included. Search-based applications are applications like any other, except that they take advantage of search technology in addition to other elements of SharePoint to create exible and powerful user experiences. Because search is essential for dealing with diverse content, especially unstructured content, applications using search are found everywhere, and their importance is growing rapidly — in step with the explosion of content volume. Yet search is generally not well understood or fully used by developers. Even though search is simple on the outside, it is complicated on the inside. Many people aren’t comfortable with the notion of a search-driven application until they see one.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
71
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
A Platform for Search-based Applications SharePoint 2013 is explicitly meant to support search-based applications. As the gure below shows, search is built as an extensible platform. There are both general-purpose search and some pre-built search-based applications included — and search is also used pervasively throughout SharePoint, especially in WCM and MySites. Most importantly there are great facilities for deploying apps and applications using search, with tooling and hooks specically for application developers. So partners and customers can create Search-Based Applications and deploy them on the same platform.
with Lync — which provides presence information and makes it easy to connect with people directly from search results. Site search (aimed at making public web sites easily navigable) is a big step up with this release as well. There are also search facilities built into each site — for example, every document library now has a search box at the top that enables users to search across metadata and the full text of its documents, and the result list is presented as a standard SharePoint view rather than as a results page. A video search SBA is provided out-of-the-box, including a pre-built presentation format that makes it easy to recognize the video content you’re looking for. There are signicant enhancements in video support for SharePoint 2013 generally, including a built-in HTMLHTML 5 video player. The use of video including enterprise podcasts will be on the rise, so video search is now an impor tant facility.
Search Driven Web Content Management
Out-of-the-box Applications There are three ‘general-purpose’ search applications included out-of-the-box with SharePoint 2013. Intranet search -typically used for all employees to nd content throughout the enterprise, benets from personalized search results based on search history and rich contextual previews. People search (which includes the advances from SharePoint 2010 such as phonetic name search) is integrated
Web content management makes extensive use of search in SharePoint 2013. Search makes it possible to create compelling user experiences, and drives several key features. Content by search — The new Content by Search web par t displays indexed content, letting you show content dynamically across multiple site collections. Users don’t know this is search powered, it just looks like well-presented content, as illustrated in the screenshot below. For a case like online catalogs, this is an essential mechanism and one that works very well.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
72
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
metadata navigation dened from values in the term store is available.
Page hierarchies, URLs, and Topic Pages — Pages and page hierarchies are easily dened from the term store. You can also generate topic pages, which makes SEO straightforward. The gure below illustrates how this works; SharePoint now generates ‘friendly URLs’ which makes this process work like any ‘normal’ site.
There’s an HTML-based presentation template model that makes it easy to ne tune the look and feel, and built-in web part editors to set up the query driving the content presentation, as shown below. This doesn’t require writing any code and is well within the reach of a business analyst. You see immediate previews of what the results will look like.
Metadata Navigation — As described in the chapter on renement and faceted navigation, facets are available for users to drill into content. In addition to reners (which are driven from the values in the content),
Recommendations — A new recommendation facility is included which can surface suggestions based either on popularity or on correlations between items (see chapter on Analytics) There are other exciting things about WCM with SharePoint 2013. Standard web design tools and workows are supported; there are great facilities for content variations including a built-in language translation service; you can publish easily across sites, and video and images are easily embedded and beautifully rendered across multiple devices and resolutions. The URLs generated are clean, and search-engine optimization is directly suppor ted. You can use catalog-enabled sites for scenarios such as a content repositor y, knowledge base, or product catalog. But the hear t of WCM in this release is search, which makes dynamic page generation and remarkable site experiences possible.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
73
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
MySites — Driven by Search The social features in SharePoint, including MySites are dramatically enhanced, building on the capabilities introduced in SharePoint 2010. SharePoint 2013 adds new features that improve and facilitate the enterprise social activities within the organization: you can follow people as well as content, share personal documents easily and keep track of access, keep up-to-date with activities of interest. Under the hood, there are two lists for providing social features: the Microfeed list and the Social List. Search drives several key social features in SharePoint 2013, even ones where it’s not apparent that search is used under the hood. Clicking on a hash tag in a post or discussion shows a list of all conversations about that topic enterprise-wide. In MySites, users can access a list of all SharePoint tasks assigned to them, regardless of which sites the assignments are stored in. They can also see the documents they are following, as illustrated below. Another example is in “My docs: shared with me”, which shows you all the documents shared with you from everyone’s My Documents. It looks like a form view but, in reality, it uses Search underneath to aggregate content from all MySites across site collections. Behind the curtain, there’s a query against a “ShareWith” eld for your name, which also lters out docs shared with everyone. All security trimmed, naturally.
e-Discovery — Driven by search SharePoint 2013 has gone one giant step further toward elding a full e-Discovery
application. There is now unied discovery across Exchange, SharePoint and Lync, as shown below. Exchange now has the same search infrastructure as SharePoint, which makes unifying the search much easier (Lync archives via Exchange). The Discovery Center in SharePoint uses this to provide a unied console, with in-place holds that don’t impact the end user’s ongoing work. There’s more to e-Discovery than search, of course — preservation, holds, policy management, and export. But search is the cornerstone and is what makes it possible to recall all the information needed to react to legal actions, without getting irrelevant information that you have to sift through. The e-Discovery functionality in SharePoint Server 2013 provides is a big step up from
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
74
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
SharePoint 2010, and is probably the rst time you could consider this to be a full applications. There are several parts to e-Discovery: • A site collection where you perform e-discovery queries across multiple SharePoint farms and Exchange servers and preserve the items that are discovered. • In-place preservation of Exchange mailboxes and SharePoint sites — including SharePoint list items and SharePoint pages — while still allowing users to work with site content. • Support for searching and exporting content from le shares. • The ability to export discovered content from Exchange Server 2013 and SharePoint Server 2013. The eDiscovery Center site template creates a portal for discovery cases and lets you conduct searches, place content on hold, and expor t content. For each case, you create a new collaboration site that uses the eDiscovery Case site template. You can export the results of an eDiscovery search for later import into a review tool.
Search to crawl all le shares and websites that contain discoverable content, and congure the central Search service application to include results from Exchange Ser ver 2013. Any content from SharePoint 2013, Exchange 2013, or a le share or website that is indexed by Search or by Exchange Server 2013 can be discovered from the eDiscovery Center.
Customize, Extend, and Create New Search-Based Applications Search-based applications are found over a very wide range of roles, industries, and levels of sophistication. There are common patterns to these applications; the table below shows just a few of these application patterns. SharePoint 2013 provides models that span a spectrum from simple conguration, through extension of capabilities, to creation of new sophisticated search based applications. The new mechanisms in SharePoint 2013 for customizing user experience (query rules, result blocks, and result sources) and the ability to theme SharePoint easily provide a lot of power
SharePoint 2013 provides in-place holds — content that is put on hold is preserved, but users can still change it. The state of the content at the time of preservation is recorded. If a user changes the content or even deletes it, the original, preserved version is still available. To implement eDiscovery across an enterprise, you congure SharePoint 2013
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
75
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIONS
for customizing search experiences without any code at all. Many areas can be extended — connectors, content processing, relevance, query processing, and UI — with moderate effort and standard tools. Fully custom code is supported as well. We nd that the use of modular building blocks speeds the construction of search based apps dramatically. Since these applications follow common patterns, a relatively small number of sophisticated modules can cover a large number of applications. If you undertake a sophisticated search-based application, consider what’s available on the market as well as what you might build yourself — since pre-built building blocks can save substantial time and reduce risk.
Deeper Dives Book chapter on developing with search from Wrox “Professional SharePoint 2010 Development” » Overview of eDiscovery and In-Place Holds (SharePoint 2013) » Blog on using the Content by Search Web Part » BA Insight » Search as a Development Platform » TotalView Search-Based Applications »
Acceleration of Search With SharePoint 2013 Microsoft has taken a big step forward in helping people do more with search: • It is far easier to own and use high-end search capabilities • Search is used pervasively • Some search-based applications are built-in • It is easier to create and operate tailored search-based applications We expect many more interesting applications to emerge as a result.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
76
Conclusion
77
CONCLUSION
This e-book has covered a lot of ground, since SharePoint 2013 has so many underlying changes, new capabilities, and new features. We’ve tried to cover everything in concise, readable chapters, across ve major sections: User Experience; Working with Queries; Working with Content; Architecture, Deployment, and Operations; and Development and Applications.
User Experience
Working with Queries & Results
Working with Content
This new platform has a lot to love; it is:
• Clean, fast, and easy to use • Straightforward to install, administer, and scale • Provides very powerful high-end search
Architecture, Deployment & Operations
Applications & Development
Microsoft has done a remarkable job making this high-end technology accessible and easy for the mainstream. However, it is not a perfect platform, and there are still challenges with search. Search is, after all, a journey.
features
• Makes creating search-based applications simpler than ever
BA Insight is entirely focused on the road that lies ahead for search and SharePoint 2013, and we stand ready to help you on your journey. As you learn more about SharePoint 2013 and search, here are some things to consider and some steps we’d suggest:
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
78
CONCLUSION
THINGS TO CONSIDER
SUGGESTED NEXT STEPS
SharePoint 2013 includes a very powerful new search engine.
Get to know the new release ASAP — download the bits, read about it, and confer with folks that k now it.
There are new mechanisms in SharePoint 2013 (result sources, query rules, and result types) that replace familiar ones, and take some getting used to. These are now in the hands of site collection administrators and site administrators, so there is much more control at that level.
Try to develop a champion amongst your site administrators, who learns the new tools. Set up a playpen system where people can get used to the new mechanisms.
Crawling and BCS have evolved further in SharePoint 2013, including a new continuous crawl feature, however connectors are still largely left to partners.
Take stock of your current and future content sources and think about extending search to more conten t. Look at learning how to make simple connectors yourself, and at Microsoft connector partners for more complex systems.
The new search core in SharePoint 2013 is different from either FAST or SharePoint 2010, and you will notice improvements in relevance, performance, and robustness.
Consider how quickly you can migrate to the new platform. Factor in techniques which allow you to upgrade a step at a time, such as search-rst migration and cross-version federation.
Hybrid congurations across on-prem SharePoint 2013 and O365 are supported OOB using result sources. Crossversion congurations are not supported OOB but there are techniques and par tner products for these cases.
Consider adopting O365 quickly, in ways that you don’t need to do it all at once. Talk to Microsoft Par tners about federation, cross-version congurations, and migration.
Though SharePoint 2013 Search is great, there are still limitations and cases where the mechanisms don’t cover what you wish to accomplish.
Look to the Microsoft partner ecosystem for training, components, and innovative solutions.
The term store is now an administrative center for entity extraction, query suggestions, faceted renement, WCM page hierarchies, and more .
Get familiar with the term store. Find out where there are key lists in your organization (product names, project names, industries, etc) — you will be able to import these into the term store and use them for entity ex traction.
If you are coming from FAST, you will recognize a lot of concepts and powerful features. But you will also notice a number of things ‘missing’.
Focus on the problem, not the specic mechanism — there’s a way to get it solved with this platform. Turn to Microsoft Partners for products that round out all the possibilities.
SharePoint 2013 has a new development model that is lightweight and available to a much wider range of developers. Search in SharePoint 2013 is a powerful platform designed to support search-based applications.
Consider applying JavaScript developers to building SharePoint Apps. Look around your organization for opportunities to apply search-based applications.
SHAREPOINT 2013 THE ESSENTIAL GUIDE TO ENTERPRISE SEARCH
79