Sunteți pe pagina 1din 76

Azure Functions, Accessibility, Machine Learning, Data Visualization

NOV
DEC
2019
codemag.com - THE LEADING INDEPENDENT DEVELOPER MAGAZINE - US $ 8.95 Can $ 11.95

What’s Your
Superpower?

SQL Server An Introduction Get Started


and Machine to Digital with Serverless
Learning Accessibility Azure Functions
MGM GRAND LAS VEGAS, NV
NOVEMBER 18 – 21, 2019

SCOTT ERIC
GUTHRIE BOYD
Executive Vice President, Corporate
Cloud + AI Platform, Vice President,
Microsoft AI Platform, Microsoft

SCOTT SCOTT
HANSELMAN HUNTER
Principal Program Director of Program
Manager, Web Platform Management .NET,
Team, Microsoft Microsoft

GET THE
JEFF JOHN
FRITZ PAPA
Senior Program Principal Developer
Manager, Microsoft Advocate, Microsoft

INSIDER VIEW

BOB KATHLEEN ANNA ROBERT


WARD DOLLARD THOMAS GREEN
Principal Architect Azure Principal Program Data & Applied Scientist, Technical Evangelist,
Data/SQL Server Team, Manager, Microsoft Microsoft DPE, Microsoft
Microsoft

DEVintersection.com 203-264-8220 M-F, 9-4 EDT AzureAIConf.com


April 7 – 9, 2020 Orlando, FL
Workshops April 5, 6, 10
Walt Disney World Swan and Dolphin
150+ Sessions
75+ Microsoft and
industry experts
Full-day workshops
Evening events

REGISTER by JANUARY 13
for a WORKSHOP PACKAGE
and receive a choice of Surface Go
hardware or hotel gift card! Xbox One S
Shown are samples of past Xbox One X
hardware choices.
Surface
Headphones

Follow us on: twitch.tv/devintersection


Twitter: @DEVintersection Facebook.com/DEVintersection LinkedIn.com/company/devintersectionconference/
Twitter: @AzureAIConf Facebook.com/MicrosoftAzureAIConference LinkedIn.com/company/microsoftazureaiconf/

Powered by
DEVintersection.com
203-264-8220 m–f, 9-4 est
TABLE OF CONTENTS

Features
8 Enhance Your Search Applications 50 POURing Over Your Website:
with Artificial Intelligence An Introduction to Digital
Search is everywhere. But unless you add it to your app, you won’t Accessibility
find it there! Sahil examines the various search tools in the Microsoft Everyone knows that there are standards when it comes to building
ecosystem and shows you how to make the most of them. apps. And most people know that there are standards for accessibility.
Sahil Malik But did you know that writing accessible apps is better for everyone?
Ashleigh shows you what to think about the next time you sit down
14 S ynchronizing the In-Browser to create something.
Ashleigh Lodge
Database with the Server
Craig shows you how to gracefully resolve conflicts and synchronization
54  est Practices for Data
B
issues with disconnected databases.
Craig Shoemaker
Visualizations:
A Recipe for Success
22  et Started with Serverless
G Helen shows you the ins and outs of creating really useful charts and

Azure Functions
graphs with Tableau. You’ll never make a boring old pie chart again.
Helen Wall
Azure Functions take care of most of the server-related problems tied
to hosting. Julie shows you how to integrate them with your own app
and then monitor the results.

28
Julie Lerman

Women in STEM: An Interview


Columns
Whether you’re in the middle of your career or just starting out, women 74  hat Captain Marvel
W
in science, technology, engineering, and math (STEM) have unique
challenges. Listen in as Sumeya and Sara interview each other about it. Can Teach Us About
Sumeya Block and Sara Chipps
Management
32 S tone Soup: Cooking Up Custom Dian spends an evening re-watching Captain Marvel with a group
of friends and they realize that there’s a lot more to that movie
Solutions with SQL Server than just a rollicking good film.
Dian Schaffhauser
Machine Learning
SQL Server 2017 has machine learning services baked right in.

Departments
If you’ve been wondering how to use it, you’ll be fascinated
by what Jeannine serves up.
Jeannine Takaki-Nelson

44 Emotional Code 6 Editorial


Whether you know it or not, your code says something about you.
Kate tells you how to read emotions in existing code and how to be
a better member of the coding community when writing your own. 24 Advertisers Index
Kate Gregory
73 Code Compilers

US subscriptions are US $29.99 for one year. Subscriptions outside the US pay US $49.99. Payments should be made in US dollars drawn on a US bank. American Express,
MasterCard, Visa, and Discover credit cards are accepted. Bill Me option is available only for US subscriptions. Back issues are available. For subscription information,
send e-mail to subscriptions@codemag.com or contact Customer Service at 832-717-4445 ext. 9.
Subscribe online at www.codemag.com
CODE Component Developer Magazine (ISSN # 1547-5166) is published bimonthly by EPS Software Corporation, 6605 Cypresswood Drive, Suite 425, Spring, TX 77379 U.S.A.
POSTMASTER: Send address changes to CODE Component Developer Magazine, 6605 Cypresswood Drive, Suite 425, Spring, TX 77379 U.S.A.

4 Table of Contents codemag.com


EDITORIAL

Sock, Sock, Shoe, Shoe


I was at the opera, and the main character pretended she was a man for most of the evening.
She began the show by dressing so we could see that she was a woman pretending to be a man
and I was struck by how she put on her shoes: first one sock, then the shoe, then the other sock,

and the other shoe. That’s how I put on my socks The most useful thing was what I found energiz- doing the various tasks in a large project gives
and shoes. ing—working in a different place, starting at the you the opportunity for different views about
other end of the To Do List, changing the music I how to proceed. Even the act of delegating tasks
The next day, I was at the acupuncture clinic, and listen to when I’m working. gives you different perspectives on timing, tool-
I noticed someone putting on his shoes: both ing, and approaches.
socks first, then both shoes, starting on the right It seems logical that shaking up the usual or-
both times. der of things would be useful in a larger project There are many ways to shake up your process,
with more staff, too. Not change for the sake of and I plan to be more interested in them, and to
I asked a few people, and everyone seemed to change if it’s disruptive, but maybe the alphabet take note when I’ve fallen into a habit. Even so,
have really strong opinions about the correct or- could start in the middle every now and then. it’s just wrong to put on both socks and then both
der of such things. Several people explained that shoes. That’s just crazy talk.
they put on both socks first “in case there was a Back in the day, when we used to edit on print-
fire.” (Wouldn’t you rather have one shoe than ed-out copies, I learned a neat trick for catch-  Melanie Spiller
no shoes in a fire? What if you needed to stomp ing things like repeated words or extra spac- 
the fire out?) People were very committed to the es—things that are hard to catch under normal
“correct” order of things. circumstances. Turn the page upside-down. As it
happens, I have some peculiarity in my brain that
It got me thinking about what other things we do lets me read and write upside-down with great
in a certain order by rote: brushing teeth, brush- facility (I can’t write in cursive upside-down, only
ing hair, starting the car and putting on a seat printing, which makes me wonder if I learned
belt. I walk every day and always take the same this trick before I learned cursive), but even so, I
route or some version of the same route. I told found more errors like that than I did when edit-
myself it was because I loved the walk so much. ing right-side up.
So I decided to prove it. I walked the route in the
opposite direction. Another trick is for when I get stuck while writ-
ing. Write it out of order! You don’t have to start
A walk that normally takes an hour and a half at the beginning of the article/chapter/book/
took two and a quarter hours! I had trouble paper/story when you’re writing any more than
getting across a major intersection—normally I you have to start at the beginning when you’re
crossed there an hour later when the traffic had reading. Write the bits that interest you most or
died down. People I routinely nodded and smiled that are easiest to write, and then go back and
at and said good morning to didn’t recognize me introduce them. Or, if you’re really stuck, write
(nor I them until I realized that I wasn’t paying each heading/chapter title/topic on three-by-
attention). I noticed houses and shops that I’d five cards and toss them in the air. Start by writ-
never seen before, I got a whole different view ing the cards that land right-side-up. Or the up-
of some massive construction, and I began to tire side-down ones. Boom! Responsibility for writing
right where I usually got to my happy place. in order is over. Writing this way guarantees that
you’ll go back and read from end-to end to make
I know that I’m a creature of many habits, so I sure it all gels. (Believe me, your editor knows
decided that it was clearly time to shake things that you didn’t reread your article a single time
up. I sat to my work in a different place, listened because of all the stupid typos, but also, things
to different music, practiced my music at a dif- written in haste read like things written in haste.)
ferent time of day, watched different things on
television, wore different clothes than usual, Software development projects are commonly
did my shopping and laundry on different days, done out of order. Sometimes you need to test
went to bed at a different time—even brushed my the viability of a project by tackling the most dif-
teeth in a different order. Some of the changes ficult bits first. Sometimes you start with the out-
felt refreshing, some were annoying or felt like put of a project: reports and user interfaces come
interference, and some, like the walk reversal, to mind. Sometimes it’s just how you delegate
were interesting and informative. responsibilities for features. Assigning people to

6 Editorial codemag.com
codemag.com
ONLINE QUICK ID 1911021

Enhance Search Applications with AI


Search. It’s so ubiquitous, so easy, yet so difficult. Users expect to see that friendly search box in their applications. They seem to
really like it, because it’s so simple to use. You don’t need a user manual to figure out search. In fact, if your application doesn’t
have search, you’ll be pelted with negative reviews. No wonder you see search in so many applications. Yet, search is hard.

It’s very difficult to implement. We all know it’s more than Finally, there’s Azure Search, which is the focus of this ar-
just simple text matching. Even simple text matching isn’t ticle. Azure Search is one of the products under the Azure
easy. Those of us with database backgrounds know that umbrella. It allows you to create your own private search cor-
searching for “prefix*” is a lot easier than searching for pus in the cloud. It’s best viewed as a cloud-hosted, Internet
“*suffix”. And users want to do all sorts of weird searches scale, search-as-a-service solution. It allows you to search
like “*run*”, which should match ran, or shrunken or brunt, your data, in an index you define, with documents you put in
or—you get the idea. Quick search results and performance the index, at a schedule you define. All this, but with none of
are important, as is accuracy and ranking. You almost have the complexity that’s typically paired with an enterprise-class
to read the user’s mind. And then there’s the whole idea of search product. Microsoft Azure manages all of the infrastruc-
Sahil Malik keeping your search results fresh. Not an easy task, is it? ture complexity, and, as I mentioned earlier, I assure you, the
www.winsmarts.com learning curve here, is indeed quite gentle.
@sahilmalik What’s amazing is that all that complexity barely scratches
the surface of the endless possibilities. About 70% of the One pretty amazing capability of Azure Search is the abil-
Sahil Malik is a Microsoft data on the Internet is visual. Photos and videos. Another ity to enhance it with the power of AI using the cognitive
MVP, INETA speaker,
big part is audio. Wouldn’t it be useful to be able to search capabilities of Azure Search. The typical process of search
a .NET author, consultant,
through audio and video as well? is to define the index, import data, and execute queries.
and trainer.
Cognitive capabilities allow you to make further sense of the
Sahil loves interacting with Have you ever thought yourself asking a question such as, imported data. For instance, a video could be further deci-
fellow geeks in real time. “I have this tune stuck in my head, what song is that?” Yes, phered into the people appearing in the video, and text-to-
His talks and trainings are we know there are apps that’ll do that on your phone. But speech capabilities could make the spoken text in the video
full of humor and practical what if that power was brought to your corporate world. searchable. Or you could use OCR capabilities to make the
nuggets. Say, “Someone said xyz in a meeting or perhaps an email, or text in the images searchable. I’ll show you how to do all of
maybe it was document, I wish I could find out easily where this in this article.
His areas of expertise are xyz was said and by whom.” Personally, I struggle with this
cross-platform Mobile app deluge of information every day. Finding that needle in the Again, I assure you, the learning curve is quite gentle.
development, Microsoft haystack when my boss is on the call with me is something
anything, and security I deal with far too often.
and identity. Create a Simple Search Engine
Search is incredibly powerful. It saves the user’s time. In The best way to learn how to swim is to dive in. Without
this article, I’ll show you how you can build an application much further ado, let’s go ahead and build a simple search
with a very gentle learning curve that allows you build such engine. I’ll do it using Azure Search, and I’ll explain the
functionality and more. important concepts as I go along.

But first, let’s start with clarifying the various search prod- The first thing I’ll need is the data I wish to search. There
ucts available in the Microsoft space. are two ways to put data in an Azure Search index: Push
and Pull.
Search Products Push Data into Azure Search
In the Microsoft ecosystem, there are multiple search prod- The first way to get data into your Azure Search index is by
ucts with overlapping names. The Microsoft Department of pushing data into it. Azure Search comes with a REST API,
Confusing Naming can sometimes do a great job, so it is best or .NET and Java SDKs. You can choose to push any search-
to clarify them first. able data in the index using this push-based mechanism.
Certainly, this has its advantages. You can now make almost
Microsoft has three search products. anything searchable, as long as you can programmatically
push the data in. Also, you control how and when new data
The first is Bing, which you can find at www.bing.com. becomes searchable. This means that if you have a specific
It’s an Internet-facing search engine, it’s free to use, and requirement where new data must become searchable with
searches execute against the open, anonymous Internet. a very short latency, the push-based mechanism is what you
need.
The second is search under Cognitive Services, not to be
confused with cognitive search under Azure search, which is At a high level, the process of pushing data involves you
an entirely different product. You can read more about Cog- defining an index first. When you define an index, you get
nitive Services search at https://azure.microsoft.com/en- to define a lot of details, such as what columns in the entity
us/services/cognitive-services/directory/search/. But, to are searchable, which columns are retrievable in search re-
put it simply, this is your way to tap into the power of Bing, sults, which you can perform facets on, etc. Once you define
to create an ad-free search experience, completely brand- such an index, you can push documents in that match that
able to your requirements, available as a paid offering. data structure.

8 Enhance Search Applications with AI codemag.com


Have Azure Search Pull Data intend to make searchable. This table is rather interesting.
Azure search can also pull data in using indexers. An indexer It contains information about the customers’ names, their
in Azure Search is a crawler that extracts searchable data companies, contact names, contact titles, and so much
and metadata from an external Azure data source and popu- more. Of special interest is the country column. It has 21
lates an index based on field-to-field mappings between the countries. Perhaps it would be useful to treat this column
index and your data source. There are indexers available for differently. For instance, maybe I want to issue queries for
Azure SQL, Azure Cosmos DB, Azure Blob Storage, and Azure “sales representative” in Brazil.
Table Storage.
But, before you can do any of that, you need to set up a
The process of using indexers is fairly straightforward. You search instance.
first need a data source matching one of the supported data
sources for which indexers are available. Then you can ei- Create a Search Instance
ther define an index or set up an import data job. As a part Creating a search instance is rather simple. You simply at-
of import data, the indexer can query the data structure tempt to create an instance of “Azure Search.” It’ll ask you
and suggest an index structure to you, which you can tweak the usual questions, such as what resource group you wish
further. And then you can perform a one-time import or set to put this search into, the name of the search instance,
up a recurring schedule for newer data to become available the location, etc. The most important question it’ll ask
in search results. is what pricing tier you’d like to put this search instance
into.
The obvious advantage of having Azure Search pull data is that
you can set it up with simple point and click. The disadvantage Search can be provisioned in free, basic, standard, or
is that you can only pull data for data sources from whom an storage optimized tiers. Free is fine for this tutorial or
indexer is available. And you have to wait for the indexer to testing simple systems. The biggest downside of the free
run again for newer data to show up in search results. tier is that you can’t scale it. Basic, standard, and stor-
age optimized can be scaled, but storage optimized is op-
Although, it’s also important to consider that while index- timized more for storage—your indexing is quicker, but
ers are commonly set up to run on a scheduled job, it’s also your query latency is poorer. You scale the search instanc-
possible to run an indexer on demand using the REST API es via search units, which is a combination of replicas and
and the command shown below. partitions.

POST Partitions provide index storage, and IO for read/write op-


https://[service name] erations. The more partitions, the quicker the indexing.
.search.windows.net/ Replicas, on the other hand, are instances of your search
indexers/[indexer name]/run? service. They’re used to load balance your query opera-
api-version=2019-05-06 tions. Each replica always hosts once copy of an index. If
api-key: [Search service admin key] you have six replicas, you’ll have six copies of every index
loaded onto the service.
You can also get the status of the current running indexer
easily, as shown below. It is important to realize that:

GET • There’s no feature set difference between free, basic,


https://[service name] standard, and storage optimized.
.search.windows.net/ • The only difference is scale.
indexers/[indexer name]/status? • Replicas aren’t your answer for disaster recovery. For
api-version=2019-05-06 proper disaster recovery, you need to create a separate
api-key: [Search service admin key] and identical search instance in another data center.

Although these operations give you some flexibility, they won’t There is another tier, the S3HD tier. S3HD is designed for
be as efficient as just pushing a document into an index, like multi-tenant environments and it has a feature set differ-
the push mechanism allows you to do. That’s because when ence. Indexers are not available in S3HD.
kick-starting an index and getting the status of an index, it still
has to find all new changes and then pull them in, one by one. For the purposes of this article, go ahead and provision an
But hey, it’s a good middle ground between increasing index- instance of Azure Search under the free tier.
ing results, and not having to write a lot of code.
Once inside Azure Search, you’ll see a number of interest-
Set Up a Data Source ing things. Right through the portal, you can choose to
For the purposes of this article, I’ll index the Northwind da- scale it. Because you went with the free tier, this will be
tabase. Yes, that old tired, boring Northwind database. You disabled. All tiers, including free tier, give you access to
can grab the script for the Northwind database from here, keys. There are two kinds of keys: admin and query. The
https://github.com/Microsoft/sql-server-samples/tree/ admin key can be used to programmatically affect search
master/samples/databases/northwind-pubs. Why did I pick service configuration. You can create up to a maximum of
Northwind? Well I didn’t have to. I just want a data source. two equivalent keys. Or you can create up to 50 query keys,
But feel free to target any other similar content source. which only let you query data. An admin key also lets you
query data but an admin key is a lot more powerful than
Once I set up the data source, I see the usual Northwind a query key; therefore you should use query keys for pure
tables. One of those tables is the Customers table that I querying functions.

codemag.com Enhance Search Applications with AI 9


Import Data and Create an Index and facetable. The Country column is a great candidate for
Right through the Azure portal, at the very top, you’ll also these because there are fewer and more well-defined dis-
see a button to import data. This can be seen in Figure 1. tinct values in this column. You can see how I’ve set up my
index in Figure 3.
Clicking on that import button gives you a simple form to fill
in, where you can point this indexer to your SQL server. Feel Clicking Next creates the index and starts the first crawl to
free to explore what other indexers are available to you. put data into the index. You can examine the progress of
your search instance on the overview page. If you wish to
You can see the configuration information I provided, in dive into the details of any particular indexer, you can view
Figure 2. it under the indexer’s tab on the overview page. Certainly,
all this information is also accessible via the API.
Once you click the Test connection button, you’ll see a
drop-down appear with all the tables and views that can be Because the Customer’s table has only 91 records, in almost
indexed. Click the next button—for now, I’ll skip all settings, no time, you’ll see the indexing operation completed.
including Cognitive Services details, and go to Customize
target index. Here, you can define which columns are re- Execute Search Queries
turned in search results, which columns are searchable, and You can execute search queries directly through the Azure
you can even define which columns are filterable, sortable, portal. Remember this is for you, the administrator, to test

Figure 1: The import data button at the top of your search instance overview page

Figure 2: The import data screen for Azure SQL database

Figure 3: My Azure Search index

10 Enhance Search Applications with AI codemag.com


Figure 4: Searching for London

the queries. To integrate searches within your applications, Leveraging the Power of AI
you need to make a REST call to a request URL, with the api- You have a neat little search engine and it wasn’t too hard
key header. The value of the header is the query key. to create. Everything I showed via the Azure portal can also
be built using the REST API or the .NET or Java SDKs. And
Here’s a little tip. You’ll pay for data egress costs, but you remember, this example that I showed used an indexer to
only pay for what leaves the data center. So, if you have a query the Northwind database. What if you don’t have an in-
Web front-end for the search results, place it in the same dexer for the objects you wish to have made searchable. For
data center as your search instance. That way, you only pay instance, what if the data resides in an ERP system that has
for the data egress once. a weird arcane Web API? You can still push the objects in,
in a neat and clean JSON format that matches your index.
Let’s execute some search queries now. Under the search
explorer button, as can be seen in Figure 1, type in a search “Neat and clean JSON format.” Did that make you hiccup?
query. For instance, I’m trying to search for “London.” This We all know that the real world is hardly neat and clean.
can be seen in Figure 4. The real world is messy. So in my next example, I’m going
to leverage the power of AI to make sense of unstructured
That’s fantastic! Just like that, I was able to search for all cus- data via search.
tomers that had the word “London” anywhere in their entity.
In order to do so, I’ll use a fantastic capability of search
I can even do some wildcard searches; for instance, try called Cognitive Search. Put simply, Cognitive Search is a
searching for AR*. You’ll see that all of the objects returned bunch of skillsets that leverage the power of AI to make
have “ar” somewhere in the object. Also note that the re- sense out of unstructured data. For instance, you can OCR
turned object, as can be seen in Figure 4, contains all of text out of images and make those images searchable. You
the columns that you marked as retrievable when defining can submit a bunch of pictures and have AI recognize celeb-
the index. rities in those pictures. Or you can do speech to text and so
much more. Where the out-of-the-box abilities fall short,
Remember that the Country field was special? You made it you’re welcome to write your own skill.
filterable. Can you search for AR* in just the UK? Sure, just
use the search query like this: For this part of the article, I had a hard time coming up with
a good example, so I just took a screenshot of this article I
AR*&$filter=Country eq 'UK' am writing. Seriously, the text you see here, unedited so far
by the editor, is a screenshot I took of it and decided to make
This simple query should now show you the customers with it searchable. The goal is, via OCR, I want to be able to search
the pattern AR only from UK. Integrating this within your through the text of this article. You’re welcome to make this
application is also quite trivial. All you need to do is pick the more compelling by uploading pictures of other kinds, such as
request URL from Figure 4 and execute a simple REST call to landmarks, celebrities, your dog—whatever floats your boat.
that URL with your query key in the api-key header.
Back in the search service instance, go ahead and delete
Congratulations, you’ve just made yourself a neat little the previous index. I’m doing that just to keep my search
search engine with your data. results clean.

codemag.com Enhance Search Applications with AI 11


Now choose to click on “Import data” as shown in Figure 1 ing upon your input data, which may be more than just im-
and choose Azure blob storage as the data source. Choose ages, feel free to check whatever seems fit.
to go with the default parsing mode and choose to extract
Content and Metadata and point it to wherever your image Next, choose to customize the target index. I’ll simply ac-
is located. cept the defaults, as shown in Figure 7.

Click Next to add cognitive skills. Here’s where things be- Finally, choose to create the index.
come interesting. Under the Add enrichments section,
choose to Enable OCR and merge all text into merged_con- Clicking the submit button causes the search engine to
tent field as shown in Figure 5. crawl the document. In my case, it’s a simple screenshot of
one page of text, so it shouldn’t take too long. You can click
Notice the other capabilities you can tap into, as shown in the Refresh button you see in Figure 1 to keep up to date
Figure 6. with your progress.

I know my data is a simple screenshot of this article with Once the search crawling is done, visit the search explorer
just text, so I’ll skip checking all those textboxes. Depend- and execute a search. For instance, I used the phrase “Seri-

Figure 5: Enabling OCR

Figure 6: Additional cognitive skills

Figure 7: The target index

12 Enhance Search Applications with AI codemag.com


Figure 8: Searching an image using OCR.

ously, the text you see here”, so let me just search for the vice solution, hosted in the cloud. This means that you can
word Seriously. now bring the power of search into your applications with
ease. Did you notice that there was no code in this article?
The search results can be seen in Figure 8. Well, everything I showed can be done via the SDK or the
REST APIs. That’s the gentle learning curve of Azure Search
This is truly mind-blowing. I just did an OCR search of an im- putting all that power in your hands with such ease.
age. And it doesn’t even begin to scratch the surface of the
possibilities here. For instance, if you were a media com- And then you add AI to the mix and the power multiplies
pany and all your photos, audio, and video were in Azure exponentially. Now you can search any kind of content. You
BLOB storage, just through a simple point-and-click, you can issue search queries in various languages. You can make
could make all that media searchable. sense of completely unstructured data. Have you ever run
into a law firm saying, “we have so many documents and we
Then you could issue a search query such as “Picture of wish we could search through them easily” and they wish to
Satya Nadella” and it’ll show pictures of Satya Nadella, as- keep their data private?
suming your media library had such pictures. Or you could
search “a dog lying on grass” and it’ll match the pictures. Or Azure search is your answer, and it’s an answer to many
you could even issue queries in English, and it’ll match non- other commonly heard problems.
English documents via the magical powers of AI-powered
language translation. How will you use Azure Search? Do let me know.

Until next time, happy searching!


Summary
Search is a great feature to have. Users find it useful. So  Sahil Malik
much so that they almost demand to see it in your appli- 
cations. But building a search engine is a non-trivial task.
There are products out there that will help, but they’re ex-
pensive to both buy and to run. And they need hardware,
expensive and powerful hardware. Azure Search eliminates
all such complexity by providing you with a search-as-a-ser-

codemag.com Enhance Search Applications with AI 13


ONLINE QUICK ID 1911041

Synchronizing the In-Browser


Database with the Server
Whether you’re building a traditional distributed system or an offline Web app, synchronizing data and reconciling conflicts
are accompanied by some hard realities. Sometimes data gets stale, sometimes users update the same data simultaneously,
and sometimes synchronization attempts fail. This article demonstrates how to gracefully resolve conflicts and synchronize

disconnected databases. The examples explored in this article In the same fashion as with multiple server instances of
demonstrate how to work with the PouchDB API (Listing 1) CouchDB, data from PouchDB synchronizes with server-side
as well as how to create a to-do list application that synchro- databases. This means that data manipulated in a discon-
nizes with server (Listing 2 and Listing 3). Figure 1 shows nected state from the server can seamlessly flow up to the
a screenshot of the running application. The application is server.
available on GitHub at https://github.com/craigshoemaker/
synchronize-dbs-demo. PouchDB is a JavaScript implementation of CouchDB that
uses IndexedDB, and, on rare occasion, Web SQL. The fol-
lowing similarities exist between PouchDB and CouchDB.
Craig Shoemaker
Different Databases
craigshoemaker.net in Different Contexts
@craigshoemaker CouchDB (http://couchdb.apache.org) is a server-side multi-
master-document database that seamlessly synchronizes data PouchDB is a browser-based data-
Craig Shoemaker is a devel-
oper, author, speaker, and
among disconnected database instances. As data changes, a base interface that’s tailor-made to
complete revision history for each document is stored, giving
Senior Content Developer
CouchDB the context to handle synchronization and resolve synchronize with CouchDB. This means
for Microsoft on the Azure
Functions team. From conflicts. As databases are synchronized, the revision history that data manipulated in the browser
is used to decide which revisions prevail among the different
building samples, internal
versions. When dealing with conflicts, the revision informa-
can seamlessly flow up to the server.
tools, and writing articles,
Craig helps developers tion is used to allow users to select winning revisions.
around the world learn
to build serverless A core aspect of CouchDB known as “eventual consistency” • The APIs are consistent. Although not identical,
applications. means that changes are incrementally replicated across the much of the code you write for PouchDB works directly
network. This same principle is at work when dealing with against CouchDB.
As a Pluralsight author, databases found inside a Web browser. • PouchDB implements CouchDB’s replication algo-
Craig specializes in teaching rithm. The same rules are enforced on the client as
JavaScript, HTML5, and PouchDB (https://pouchdb.com) is a browser-based database exist on the server that decide how data is synchro-
IndexedDB. interface that’s tailor-made to synchronize with CouchDB. nized across multiple database instances.
In the future, Craig wants
to learn how to tell a joke.

Figure 1: Screenshot of running application

14 Synchronizing the In-Browser Database with the Server codemag.com


• HTTP as a core transport. CouchDB exposes RESTful Document Revisions
HTTP/JSON APIs that allow direct access to data. Ex- Synchronization is made possible by carefully tracking
posing data through HTTP side-steps the data access document revisions. Each document revision generates a
layers often required to work with other databases. unique identifier, known as the revision ID. There are two
PouchDB capitalizes on this feature and sends JSON parts to a revision ID. The first part is a human-readable
payloads via HTTP to interface directly with CouchDB. incrementing integer. The second part of the revision ID

Listing 1: Working with the PouchDB API


let localDB; };

const api = { api.syncer = localDB.sync(remoteDB, options);


syncer.on(‘change’, e => console.log(‘change’, e));
init: async () => { syncer.on(‘paused’, e => console.log(‘paused’, e));
syncer.on(‘active’, e => console.log(‘active’, e));
const databaseName = ‘people’; syncer.on(‘denied’, e => console.log(‘denied’, e));
syncer.on(‘complete’, e => console.log(‘complete’, e));
localDB = new PouchDB(databaseName); syncer.on(‘error’, e => console.log(‘error’, e));
await localDB.destroy();
// syncer.cancel();
localDB = new PouchDB(databaseName); },
console.log(localDB);
resolveImmediateConflict: async (selectedSource) => {
api.seed();
}, const record = /database/i.test(selectedSource) ?
databaseRecord :
add: async () => { incomingRecord;
const person = {
_id: ‘craigshoemaker’, const person = localDB.get(record.id);
name: ‘Craig Shoemaker’, person.title = record.name;
twitter: ‘craigshoemaker’
}; const response = await localDB.put(person);
},
const response = await localDB.put(person);
console.log(response); resolveEventualConflict: async (id, winningRevId) => {
}, const options = { conflicts: true };

get: async () => { // get item with conflicts


const person = await localDB.get(‘craigshoemaker’); const item = await localDB.get(id, options);
console.log(person);
}, // filter out wanted item
let revIds = item._conflicts;
update: async() => { revIds.push(item._rev);
const person = await localDB.get(‘craigshoemaker’); revIds = revIds.filter(conflictId => conflictId !== winningRevId);
console.log(person);
// an array of items to delete
person.github = ‘craigshoemaker’; const conflicts = revIds.map(rev => {
return {
const response = await localDB.put(person); _id: item._id,
console.log(response); _rev: rev,
}, _deleted: true
};
remove: async () => { });
const person = await localDB.get(‘craigshoemaker’);
console.log(person); const response = await localDB.bulkDocs(conflicts);
},
const response = await localDB.remove(person);
console.log(response); seed: async () => {
}, let counter = 0;

getAll: async () => { await localDB.bulkDocs([


const options = { {
include_docs: true, _id: ‘jimnasium’,
conflicts: true name: ‘Jim Nasium’
}; },
{
const response = await localDB.allDocs(options); _id: ‘ottopartz’,
console.log(response); name: ‘Otto Partz’
},
return response.rows; {
}, _id: ‘dinahmite’,
name: ‘Dinah Mite’
syncer: {}, }
]);
sync: (live = true, retry = true) => {
const options = { console.log(‘The local database is seeded’);
live: live, }
retry: retry };

codemag.com Synchronizing the In-Browser Database with the Server 15


is a GUID-like value that’s generated by the database API. readable. When you create a document in the database, the
When you create a new document in the database, the revi- first revision ID is generated, as shown in the following ex-
sion ID is prefixed with the number 1 followed by a GUID- ample.
like value. In the following examples, a three-letter string is
used instead of an actual GUID value to make the examples 1-abc

Listing 2: Implementing a synchronized todo list, database setup


const db = {
synchronize(live = false, retry = true) {
local: new PouchDB(‘todos’), const options = {
live: live,
remote: new PouchDB(‘http://localhost:5984/todos’, { retry: retry
skipSetup: true, };
auth: {
username: ‘smoothwookie’, this._sync = this.local.sync(this.remote, options);
password: ‘ThisIsMyPassword!’,
}, this._sync.on(‘change’, e => console.log(‘change’, e));
}), this._sync.on(‘paused’, e => console.log(‘paused’, e));
this._sync.on(‘active’, e => console.log(‘active’, e));
_sync: {}, this._sync.on(‘denied’, e => console.log(‘denied’, e));
this._sync.on(‘complete’, e => console.log(‘complete’, e));
listenForChanges() { this._sync.on(‘error’, e => console.log(‘error’, e));
this.local.changes({ since: ‘now’, live: true}) },
.on(‘change’, app.getAll)
.on(‘error’, console.log); cancel() {
this._sync.cancel();
this.remote.changes({ since: ‘now’, live: true}) }
.on(‘change’, app.getAll) };
.on(‘error’, console.log);
}, db.listenForChanges();

Listing 3: Implementing a synchronized todo list – using Vue.js


const app = new Vue({ todo.title = item.title;
const response = await db[location].put(todo);
el: ‘#app’, return response;
},
data() {
return { async remove(location, item) {
localTodos: [], const todo = await this.get(location, item._id);
remoteTodos: [], return await db[location].remove(todo);
localTitle: ‘’, },
remoteTitle: ‘’,
isLiveSyncing: false async getAll() {
}
}, const options = {
include_docs: true,
created() { conflicts: true
this.getAll(); };
},
const localData = await db.local.allDocs(options);
methods: { this.localTodos = localData.rows;

async add(location) { const remoteData = await db.remote.allDocs(options);


const title = this.localTitle.length > 0 ? this.remoteTodos = remoteData.rows;
this.localTitle : },
this.remoteTitle;
manualSync() {
const todo = { db.synchronize();
_id: (new Date()).toISOString(), },
title: title
}; liveSync() {
const response = await db[location].put(todo); db.synchronize(true);
console.log(response); this.isLiveSyncing = true;
},
this.localTitle = ‘’;
this.remoteTitle = ‘’; cancel() {
}, db.cancel();
this.isLiveSyncing = false;
async get(location, _id) { }
return await db[location].get(_id); }
}, });

async update(location, item) {


const todo = await this.get(location, item._id);

16 Synchronizing the In-Browser Database with the Server codemag.com


Figure 2: Create a local instance of PouchDB.

As the document changes, the prefix is incremented by 1 const add = async () => {
and a new GUID is generated. Therefore, when you update
the document, the revision ID prefix advances from 1 to 2. const person = {
_id: ‘craigshoemaker’,
2-def name: ‘Craig Shoemaker’,
twitter: ‘craigshoemaker’
Revision IDs are updated in this way in concert with any };
data changes. Even if you delete the document from the
database at this point, the revision ID advances to 3 and const response = await localDB.put(person);
the document metadata is marked as deleted. Tracking with console.log(response);
revision IDs allows the database to maintain a full revision };
history of each document. By sustaining a running revision
history for every document, the database has the context The result returned from the database resembles an HTTP re-
necessary to replicate changes among different database sponse code. When successful, the response from PouchDB
instances. returns a response with ok: true, the document’s unique
identifier, and the revision ID value.
Working with PouchDB {
To begin working with a database in the browser, you first ok: true,
need to reference the pouchdb.js script in your HTML page. id: “craigshoemaker”,
rev: “1-747b2b81bf8ef992e8ec1f44aa737c48”
<script src="scripts/pouchdb.js"></script> }

Next, inside a script tag or in a separate JavaScript file, Once you have the identifier and revision ID, you can access
create a new instance of PouchDB. The constructor accepts and manipulate the data as you wish. To retrieve a record from
the database name. the database, you pass the document ID to the get method.

const localDB = new PouchDB(‘people’); const get = async () => {

As you create a new instance of PouchDB, the resulting ob- const person =
ject either points to an existing database or it creates a new await localDB.get(‘craigshoemaker’);
database for you. In this case, a new IndexedDB database
is created in the browser. PouchDB uses one of a series of console.log(person);
adapters to interface with different databases. If you in- };
spect the localDB instance in the browser console, notice
that the adapter, as shown in Figure 2, is set as idb. This The response from the database includes the full document
alludes to the fact that in the browser, PouchDB is using the data including the unique identifier and revision ID.
IndexedDB adapter.
{
PouchDB is architected with a Promise-based API that pro- _id: “craigshoemaker”,
vides an opportunity to use JavaScript’s async/await syntax _rev: “1-747b2b81bf8ef992e8ec1f44aa737c48”
when calling methods. The following snippet demonstrates name: “Craig Shoemaker”,
how to add a new object to the database by calling the put twitter: “craigshoemaker”,
method. }

codemag.com Synchronizing the In-Browser Database with the Server 17


Updating data in the database requires that you have the id: “craigshoemaker”,
latest revision ID associated with a specific document. Of- rev: “2-101931707fec4f12ff20776d94690c9f”
ten, the most reliable way to reference the latest revision ID }
is to get the latest version of the document from the data-
base just prior to updating values. To update the document, To retrieve a list of documents from the database, you use the
you can call the get method, add or update the object’s val- allDocs method. The response from allDocs varies depend-
ues, and then call the put method to persist changes to the ing on the options you provide. In the following snippet, the
database. include_docs: true option is set, which tells the method to
return full document data along with the query. The default
const update = async () => { value for include_docs is false and when not enabled, the only
information returned from allDocs is the _id and _rev values.
const person =
await localDB.get(‘craigshoemaker’); const getAll = async () => {

person.github = ‘craigshoemaker’; const options = {


include_docs: true
const response = await localDB.put(person); };
console.log(response);
}; const response =
await localDB.allDocs(options);
Once updated, the response from the database includes the
new revision ID, as shown in the following code snippet. console.log(response);

{ return response.rows;
ok: true, };

Figure 3: Return value from the allDocs method

Figure 4: State of the document after removal

18 Synchronizing the In-Browser Database with the Server codemag.com


Figure 5: Create a remote instance of PouchDB

The response, as shown in Figure 3, includes a rows array {


that holds data from the database. Inside each element the ok: true,
id and key values are copied from the data document to id: “craigshoemaker”,
make working with the data easier, and the entire docu- rev: “4-ffc5ec971505cfb9b37318877441e646”
ment’s data is available via the doc property. }

Removing a document from the data also requires reference The revision ID starts with a 4 instead of a 1, even though
to the unique identifier and latest revision ID values. The best a new document is inserted into the database. Building on
way to get the latest values is to call get immediately before these API basics, you can begin synchronizing data between
attempting to remove the document from the database. two databases.

const remove = async () => {


Synchronizing with the Server
const person = To synchronize with the server, you first need to create an
await localDB.get(‘craigshoemaker’); instance of PouchDB in the client script that points to the
const response = server-side database. By providing PouchDB with a URL and
await localDB.remove(person); authorization credentials, the browser creates a secure con-
nection to the remote database.
console.log(response);
}; const remoteDB = new PouchDB(
‘http://localhost:5984/people’,
The response from the database is reminiscent of the re- {
sponse returned from the get method. Here, you get back skipSetup: true,
the document’s ID and a new revision number. auth: {
username: ‘account_user_name’,
{ password: ‘secret_password’,
ok: true, }
id: “craigshoemaker”, });
rev: “3-70fb7e034b076663cd6861a46516c7f9”
}by When you create an instance of PouchDB against the server,
the adapter used is http, as shown in Figure 5. This means
Internally, the database hasn’t deleted your record, but has that each call to the PouchDB API is ultimately expressed
marked it as deleted by adding the _deleted property to the as an HTTP call to CouchDB over the network. The benefit
document. Figure 4 shows how a deleted record appears in to you is that your application code remains unchanged re-
the database. gardless of whether your commands are against the local
database or the server.
In fact, if you tried to create a new document in the database
with the same primary key value, instead of getting an entirely This example uses the PouchDB Authentication (https://
new revision ID, the database returns a document with a revi- github.com/pouchdb-community/pouchdb-authentication)
sion ID incremented from the deleted state. The following snip- plugin to handle authentication with the remote server. The
pet shows the database’s response after creating a new docu- plugin allows you to add options to the constructor that
ment with the same ID as the previously deleted document. authenticates your connection to the server.

codemag.com Synchronizing the In-Browser Database with the Server 19


Once you have instances of PouchDB that point to both the the CouchDB and PouchDB APIs make conflict management a
in-browser database and the server, you can then begin to first-class concern. The dual nature of the revision ID allows the
synchronize data between the two. database to resolve different types of conflicts. Conflicts are
managed by continuously evaluating the revision ID during any
The code required to handle synchronization accepts a few operation that manipulates data. There are at least two differ-
options. During synchronization, you can create a persistent ent types of conflicts that arise when using PouchDB.
live connection and choose to retry failed attempts. The fol-
lowing example creates a function that sets up synchroniza- Immediate and Eventual Conflicts
tion between the local and remote databases. An immediate conflict arises when you attempt to save
changes to a document, but the revision ID provided is
let syncer = {}; older than what’s in the database. For instance, let’s say
a new record added to the database results in a revision ID
const sync = (live = true, retry = true) => { of 1-aa1. As the document is updated, the revision ID be-
comes 2-aa2. If the first revision of the document (1-aa1) is
const options = { cached somewhere and the user tries to persist the version
live: live, of the document while the database holds a newer version,
retry: retry an immediate conflict is encountered.
};
To handle conflicts, any operation that manipulates data
syncer = localDB.sync(remoteDB, options); should be nested inside a try/catch block giving you the
chance to handle conflicts.
syncer.on(‘complete’, e => {
// handle complete try {
}); const response = await db.put(person);
} catch(error) {
syncer.on(‘error’, e => { if(error.name === ‘conflict’) {
// handle error // handle conflict
}); } else {
}; // handle other error
}
The syncer object is declared outside the sync function }
so that you have access to the synchronization instance
throughout the application. The arguments defined in the The following code snippet shows the error object returned
function allow you to select if you want to establish a live from the database during an immediate conflict. As conflicts
connection and whether you want to retry failed synchroni- are encountered, PouchDB returns a 409 (conflict) error.
zation attempts.
{
As the databases are synchronized, data flows in a bi-di- “status”: 409,
rectional direction. Data added to the remote database is “name”: “conflict”,
replicated to the local database, and vice versa. Ultimately, “message”: “Document update conflict”,
the sync method is a wrapper for CouchDB’s underlying rep- “error”: true,
lication feature. As data is replicated among individual data- “id”: “2019-06-08T12:33:00.169Z”,
bases, conflicts are not just a possibility, but an inevitability. “docId”: “2019-06-08T12:33:00.169Z”
}

Managing Conflicts The easiest way to resolve this conflict is to fetch the docu-
Dealing with conflicting data sits at the heart of any attempt to ment’s latest version, update the required values and then
synchronize databases. Embracing the inevitability of conflicts, attempt to save the document again.

Figure 6: Conflicts array in a document

20 Synchronizing the In-Browser Database with the Server codemag.com


By contrast, an eventual conflict happens when a revision
ID is mismatched during a synchronization attempt. Con- revIds = revIds.filter(
sider the situation when an existing document is updated in conflictId => conflictId !== winningRevId);
the browser and the resulting revision ID becomes 2-bb1.
Then the same document is updated directly on the server, // delete the rest of the items
and that copy’s revision ID becomes 2-bb2. The document is const conflicts = revIds.map(rev => {
updated for the second time in both locations, but the dis- return {
connected databases are unaware of each other’s change. _id: item._id,
Eventually, the databases will synchronize together and the _rev: rev,
conflict for this document must be resolved. _deleted: true
};
The CouchDB replication logic handles conflicts seamlessly. });
As databases are synchronized, the replication algorithm
automatically selects a revision as the winner for you. As the const response = await localDB.bulkDocs(conflicts);
winning version is selected, the metadata of the document
is flagged as being in a conflicted state and is associated The call to get includes the document ID and an options
with an array of revision IDs that represent the conflicted object where conflicts is set to true. This tells the database
versions. to fetch the document with the matching ID and return an
array of revisions IDs in an array named _conflicts. Next,
When you retrieve data from the database, you have the the revision IDs are isolated into a variable named revIds.
option to request conflicts associated with a specified docu- The current winning revision ID is added to the array with Painless IndexedDB
ment. The following example demonstrates how you can re- revIds.push and then the user-selected winning version is
Even if you don’t have the
quest conflicts when calling the allDocs method. filtered out of the revision IDs array. Now the revIds array
need to synchronize data
only contains values of revision IDs that aren’t selected by with a remote database,
const options = { the user as the winning document version. These revisions PouchDB is still a great library
include_docs: true, are meant to be deleted from the database. for working with IndexedDB.
conflicts: true The API for IndexedDB is
}; The map method is used to transform the revIds array into notoriously difficult to use
an array named conflicts. This becomes an array of objects and based on callbacks and
const response = await localDB.allDocs(options); that includes the unique identifier, the losing revision ID events rather than Promises.
console.log(response); and the _deleted property set to true. This object array is PouchDB makes adding,
then passed to bulkDocs to update the database revisions editing, and deleting data
The result from this code returns a collection of documents simultaneously. from IndexedDB easy
from the database that includes an array of revision IDs that to use and comes complete
conflict with the current version of the document. Figure 6 with modern JavaScript
shows a document in a conflicted state. Conclusion syntax support.
Built as a multi-master database from the ground-up,
Storing conflicting revision IDs as document metadata al- CouchDB makes conflict resolution a first-class concern.
lows your applications to always be aware of conflicts. By The replication logic, which powers synchronization, is ro-
writing conflict-aware code, you can early and often allow bust enough recognize conflict, temporarily select winning
users to resolve conflicts by giving them a chance to decide versions, and provide the context necessary to allow users
which version is ultimately the winner. to decide how to resolve conflicted data. In the browser,
PouchDB is a JavaScript implementation of CouchDB and
Resolving eventual conflicts involves fetching documents makes it easy to carry out not only simple data operations
from the database with the associated conflict data. Con- but to synchronize data from the browser to the server.
flict data is not returned by default, so when you call the
get method, you need to enable the conflicts option. Once  Craig Shoemaker
data is returned from the database, you can allow the user 
to designate which revision is the desired version.

The following example extracts a document from the data-


base with conflict information. The revision IDs are evaluat-
ed to find the winning revision and the database is updated
to mark all revisions as deleted except the winning revision.

// get item with conflicts


const item = await localDB.get(
id,
{
conflicts: true
});

// filter out item you want to keep


let revIds = item._conflicts;

revIds.push(item._rev);

codemag.com Synchronizing the In-Browser Database with the Server 21


ONLINE QUICK ID 1911051

Get Started with Serverless


Azure Functions
“Serverless” is a hot tech term that doesn’t quite mean what it says. That’s because there truly is a server hosting serverless apps
and functions. The point is that it feels like there’s no server because you don’t have to handle or manage one or worry about
things like scaling because it’s all done for you. Serverless functions aren’t simply Web services that you’re hosting in the cloud.

The functions are event-driven and have a beautiful way to I found an easy inspiration for a new function. I often need
orchestrate a variety of services through configurable trig- to know how many words I’ve written for things like confer-
gers and bindings, reducing the amount of code you have ence abstract submissions, etc. I find myself copying the
to write. You can just focus on the logic you’re trying to text into Microsoft Word to get that count. You’ll get to
achieve, not on the effort of wiring up the orchestrations. create your own function that returns character and word
counts for a given bit of text.
HTTP Requests, like a Web service, are just one type of event
that can trigger your function to run. You can also wire func- I’ll be using Visual Studio Code along with its Azure Func-
tions up to other triggers, such as listening for changes in an tion extension and a few other related extensions. If you
Julie Lerman Azure Cosmo DB database or a message queue. Other tasks haven’t used Visual Studio Code before, I invite you to in-
thedatafarm.com/blog you might want to perform can also be specified through con- stall it (for free on MacOS, Linux, or Windows) to try it
@julielerman figurable bindings that also don’t require code. For example, out as you follow along. VS Code is cross-platform and is
you can use an input binding to retrieve some data from a da- a breeze to install. (Go to code.visualstudio.com to install
Julie Lerman is a Microsoft
tabase that your function needs to process. Output bindings— and learn more.) You can use Visual Studio 2017 or 2019,
Regional Director, Docker
again defined with configurations, not code—let you send re- which has a similar extension built into the Azure workload.
Captain and a long-time
sults to another service. Your function only needs to create the The VS extension doesn’t have the same workflow, however.
Microsoft MVP who now
counts her years as a coder results in code and the binding will ensure that those results You can see how to get started with that in the docs and
in decades. She makes get passed on to their destination. A single function could be then come back to walk through the functions built in this
her living as a coach and triggered by a write to your Cosmos DB database, then uses an article.
consultant to software input binding to gather relevant data from the database, and
teams around the world. then uses a message queue output binding to update a cache. In VS Code, start by installing the Azure Functions exten-
You can find Julie present- If you don’t need any additional logic, the function is totally sion through the Extensions icon in VS Code’s Activity Bar
ing on Entity Framework, defined by the trigger and bindings. (along the left side of the IDE). A prerequisite of the Azure
Domain Driven Design, and Functions extension is that you install the Azure Functions
other topics at user groups All of these bindings and triggers remove many of the re- Core Tools. There are links to the OS-specific installers in the
and conferences around dundant tasks that you might otherwise have to perform Prerequisites section of the extension details (marketplace.
the world. Julie blogs at and they allow you to focus on the logic of the function. And visualstudio.com/items?itemName=ms-azuretools.vscode-
thedatafarm.com/blog, is the Azure Functions service takes care of all of the server- azurefunctions). I’ll warn you now that even for Windows, it
the author of the highly related problems tied to hosting. Integration with Applica- is an npm install, but it’s quick and painless.
acclaimed “Programming tion Insights lets you monitor your functions to observe how
Entity Framework” books, your functions are performing and being used. You don’t need an account to use this extension unless you
the MSDN Magazine Data plan to follow the deployment test later on. Note that if
Points column and popular you have a Visual Studio Subscription, you have an Azure
videos on Pluralsight.com. The Structure of Azure Functions account. If you have neither, you can get a free account at
The structure of Azure Functions is defined by a Function https://azure.microsoft.com/en-us/free/.
App that hosts one or more related functions. The app has
its own settings that are secure by default. This is a good
place to store details like connection strings, credentials, Creating Your First Function
and more. Then each function within the app is a self-con- I began with a parent folder named Code Functions, in case
tained set of triggers and bindings with its own additional I want to add more functions later. Then I created a sub-
settings. The only thing the functions share are the sub- folder called WordCount. Open VS Code in the WordCount
domain URL and the app settings. Figure 1 shows some of folder.
the Function Apps in my subscription. I’ve expanded the
DataAPINode app so you can also see the three functions I Next, you’ll use the Azure Functions extension to turn the
created for that app. WordCount folder into an Azure Functions project.

Click the Azure icon in the Activity Bar to show the Azure
Preparing Your Environment explorer. You’ll see a window for Azure Functions. If you’re
Although it’s possible to create functions directly in the logged into an Azure account, the Azure Functions explorer
portal, both Visual Studio and Visual Studio Code have ex- will also show you the Function apps in your account (Figure
tensions that make it easy to develop, debug, and deploy 2). I have a lot of demo function apps in my subscription.
Azure Functions. I’ll use VS Code and its Azure Functions To save resources in my account, I’ve stopped them all while
extension. they’re not actively in use.

22 Get Started with Serverless Azure Functions codemag.com


Hover over the Functions toolbar (as I’ve done in Figure 2) debug icon in the Activity Bar, you’ll see that the debug
to see the extension’s icons to appear. The folder icon in configuration is set to this. To debug the function, you can
the upper right-hand corner is to create an Azure Functions either click the green arrow next to that dropdown or F5.
project in a folder. The bolt is to create a function inside
the project. The up arrow is to deploy your function to the When you run the function, the Azure Functions SDK starts
cloud, and the last one is a standard refresh icon. by calling dotnet build on the project. Then it runs a com-
mand from its own Command Line Interface (CLI): func host
Click the folder icon to create an Azure Functions project. start. After this, you’ll see the Azure Functions banner (Fig-
You’ll get a number of prompts to walk you through the proj- ure 4), which is a good clue that things are working.
ect set up.
Following some status messages, you’ll then see in yellow text:
1. Select the target folder for the project: Browse to the
WordCount folder if needed. Http Functions: WordCount: [GET,POST] http://localhost:7071/
2. Select the language for your project from a drop-down api/WordCount
list: I’m choosing C# but other current options are Ja-
vaScript, TypeScript, Java, and previews for Python and The SDK will continue to perform tasks and report them in
PowerShell. the terminal. Wait until it’s complete and displays the final
3. Select a runtime: This prompt will show only if the core message “Host lock lease acquired by instance ID …” Now
tools aren’t in your system’s PATH. Choose Azure Func- it is ready for you to try it out. You can CTRL-Click (or CMD-
tions v2 to build a .NET Standard-based function. Click) the URL which will open your browser where you’ll
4. Select a template: Choose HttpTrigger. see the message “Please pass a name on the query string or
5. Provide a function name: I named my function WordCount. in the request body”. That’s because you still need to pro- Figure 1: Some Azure Apps
6. Provide a namespace: I entered CodeMagFunctions. vide either a query parameter or a body. Modify the URL to and Functions in the Azure
WordCount. include the name parameter, e.g., http://localhost:7071/ Portal for my account
7. Select Access Rights: For this demo, I’m using Anony-
mous to make things easy.

That’s it. The extension will then build out the assets need-
ed for this project.

You’ll see a new “stake in the ground” HttpTrigger function


file in the editor and the assets from the template in the solu-
tion explorer (see Figure 3). The full code listing is in Listing
1. Notice that there are four files in the .vscode folder inside
the parent (CodeFunctions) folder. The extensions.json files
is a VS Code recommendation file (code.visualstudio.com/
docs/editor/extension-gallery#_workspace-recommended-
extensions) to ensure that anyone who also works on this
source will be alerted about installing required extensions.
The WordCount folder has a few files in addition to the csproj
and cs files. The most important of these new files is local.
settings.json. This is where you store the local version of
function app settings, like connection strings and creden- Figure 2: The Azure Functions explorer listing my live
tials, if you’re using them. I won’t be doing any of that in functions as well as icons for working with local code
this demo. When you have a Function App in Azure, which is
comprised of one or more functions, there’s a set of appli-
cation-level settings shared by all of its functions and you Listing 1: Default function code created by the HttpTrigger template
have to add them explicitly. That’s what these settings align 1 public static class WordCount
with. 2 {
3   [FunctionName("WordCount")]
4   public static async Task<IActionResult> Run(
5    [HttpTrigger(AuthorizationLevel.Anonymous,"get",
Testing the Default Function 6 "post", Route=null)] HttpRequest req,ILogger log)
Before modifying our function for counting words, let’s see 7   {
8    log.LogInformation("C# HTTP trigger function
the template logic in action. The full listing of the file, ex- 9 processed a request.");
cluding using statements, is shown in Listing 1. This func- 10
tion takes an incoming string, assuming it’s someone’s 11     string name = req.Query["name"];
name, and responds with “Hello,“ to that name. The func- 12     string requestBody = 
13 await new StreamReader(req.Body).ReadToEndAsync();
tion is flexible, checking for the name both in the HttpRe- 14     dynamic data = JsonConvert.DeserializeObject(
quest query parameters and in a request body (if one exists 15 requestBody);
and has a property called “name”). 16     name = name ?? data?.name;
17     return name != null
Running this function isn’t the same as simply running a 18       ?(ActionResult)new OkObjectResult($"Hello,{name}")
19       : new BadRequestObjectResult(
.NET Core app because it has to run the Azure Functions 20 "Please pass a name on the query string or
logic. The template that created the function project also 21 in the request body");
created a VS Code Launch Configuration called “Attach to 22    }
.NET Functions” in the launch.json file. If you click on the 23 }

codemag.com Get Started with Serverless Azure Functions 23


api/WordCount?name=Julie. The browser will then display I want the function only to read a request body, so I’ll re-
the response sent back from the function, which, in my case, move the code to read a query parameter.
is “Hello, Julie” as you can see in Figure 5.
11 //string name = req.Query[“name”];
When you’re finished testing the function, return to the VS
Code terminal and press CTRL-C to stop the function. And further down, I only need to read the data from the
request body. While I’m at it, I’ll change the variable names
from name to text.
Transforming the Template Logic
Now let’s modify the file created by the template. The key change old 16 //name = name ?? data?.name;
will be adding logic to perform character and word counts. new 16 var text =  data?.text;

Now it’s time for the logic that reads the text and creates
an output with the character and word counts. Rather than
creating yet another Azure Function to perform that task,
I’ll first added a sub class called DocObject to encapsulate
the results of the analysis.

public class DocObject {


public string Text { get; set; }
public int WordCount { get; set; }
public int CharCount { get; set; }
public int NonSpaceCharCount { get; set; }
}

Then I added a method (AnalyzeText) to the current func-


tion.

private static DocObject AnalyzeText(string text)
Figure 3: Initial result of creating a function project using the Azure Functions extension {
    var charsLen = text.Length;
if (charsLen==0) return “ “;

ADVERTISERS INDEX

Azure Functions, Accessibility, Machine Learning, Data Visualization


Advertisers Index
NOV

CODE Framework
DEC
2019
codemag.com - THE LEADING INDEPENDENT DEVELOPER MAGAZINE - US $ 8.95 Can $ 11.95

www.codemag.com/framework 75
What’s Your
Superpower? CODE Staffing
www.codemag.com/staffing 7
DeveloperWeek
www.developerweek.com 49
DEVintersection Conference
SQL Server An Introduction Get Started
www.DEVintersection.com 2
and Machine to Digital with Serverless
Learning Accessibility Azure Functions
dtSearch
www.dtSearch.com 21
Advertising Sales: Figure 4: The Azure Functions logo displayed in the
Tammy Ferguson JetBrains terminal when the SDK is running properly
832-717-4445 ext. 26
tammy@codemag.com
www.jetbrains.com/resharper 76
LEAD Technologies
www.leadtools.com 5

This listing is provided as a courtesy


to our readers and advertisers.
The publisher assumes no responsibi-
lity for errors or omissions.
Figure 5: The response from the default HttpTrigger function

24 Get Started with Serverless Azure Functions codemag.com


    var noSpacesLen = text.Replace(“ “, “”) If you want to be sure that you haven’t made any typos,
.Length; you can run dotnet build from the terminal window. I think
char[] delimiters = new char[]  that’s always a good idea.
{ ‘ ‘, ‘\r’, ‘\n’ };
    var wordCount = text.Split(delimiters,
 StringSplitOptions.RemoveEmptyEntries) Debugging the WordCount Function
.Length; One of the great features of the Azure Functions extension
    var docObject = new DocObject { Text = text, in VS Code as well as Visual Studio is that because you are
WordCount = wordCount, writing it in the IDE, you can take advantage of all of the
CharCount = charsLen, debugging features provided by the IDE. If you’re new to
NonSpaceCharCount = noSpacesLen }; VS Code, you may not realize that it has a very rich debug-
    return docObject ; ging experience with break points, watch variables, Code
} Lens, and more. So, if you find that your function isn’t doing
what you expect, you can set a breakpoint and debug as you
These both go after the Run method. I had some help from would any other code.
https://stackoverflow.com/questions/8784517/counting-
number-of-words-in-c-sharp to find an efficient way to Because the function now relies on a request body, you’ll need
count words that takes some punctuation into account. a way to compose the body and include it with the HTTP re-
quest to the function for testing. You may already be a wiz
Finally, you’ll create a string from the results of Analyze- with tools like Fiddler, Postman, or the browser developer
Text to send back in the function’s result. Here is the new tools. I’m a fan of another VS Code extension called REST Cli- REST Client Extension
listing for the Run method of the WordCount function. I’ve ent Huachao Mao.
The Visual Studio Code
removed the ILogger parameter in the method’s signature
REST Client extension by
along with the log.LogInformation call from the original I created a new folder in the .vscode folder called RESTCli- Huachao Mao (marketplace.
logic. entTesters to keep these out of my function project. Add visualstudio.com/
a new file called WordCountTest.http. The HTTP extension items?itemName=humao.
[FunctionName(“WordCount”)] isn’t required for the REST client to work. I just use it to rest-client) has over 2.5 million
public static async Task<IActionResult> Run( identify my REST client files. Enter an HTTP request, de- downloads, so I’m not alone
[HttpTrigger(AuthorizationLevel.Anonymous, fining the method, headers, and body. Here’s the simple in my fandom. If you want
“get”, “post”, Route = null)] request I’m using to start. The request body itself is the to try it, go ahead and install
HttpRequest req) last three lines. the extension and follow my
{ lead. Otherwise, use the tool
    string requestBody = await new StreamReader POST http://localhost:7071/api/WordCount HTTP/1.1 you’re familiar with.
(req.Body).ReadToEndAsync(); content-type: application/json
    dynamic data = JsonConvert {
.DeserializeObject(requestBody);     “text”: “Hey, I built a serverless function!»
    string text = data?.text; }
var httpResponseText = $@”CountText results:
Total Chars:{docResults.CharCount} Before you can test the request, you’ll need to get the func-
Chars No spaces: tion running again. Put some breakpoints into the code if you
{docResults.NonSpaceCharCount} want to step through to watch the logic in action, use F5 to
Word count: {docResults.WordCount}”; run it in debug mode. Remember to wait for the “host lease
   return text != null locked” message to know that the function is ready to accept
        ? (ActionResult) requests. Then, with the WordCountTest.http editor window
new OkObjectResult(AnalyzeText(text) active, you can send the request. You can use keystrokes
        : new BadRequestObjectResult (Windows: CTRL-ALT-R; macOS Opt-Cmd-R) or press F1 and se-
(“Please include text to analyze”); lect Rest Client: Send Request. The extension will open a new
} window to display the results returned from the function. The

Figure 6: Using the REST Client extension to test the function

codemag.com Get Started with Serverless Azure Functions 25


Figure 7: Response from running the second POST request with the REST Client extension

Figure 8: Cosmos DB document created by the Cosmos DB output binding

result shown in Figure 6 tells me that my text has 35 charac- Advanced version of this option but choose the simple ver-
ters, 30 without spaces, and the word count is six. sion for this demo. You’ll need to provide a new name for
the function app. The prompt asks for a “globally unique”
name. That doesn’t just mean unique to your account, but
Deploying My New Function to anyone’s account in the whole world. That’s because the
to the Cloud final URI will be a subdomain of azurewebsites.com. I chose
For my new function to be truly useful, I’ll need to deploy CodeMagFunctions for this example. The simple version of
it to Azure. Even with 30 years of software experience, the “Create new Function App doesn’t give you the chance to se-
term “deploy” still makes my heartrate go up a little. Luck- lect a resource group or choose the location. The advanced
ily, both Visual Studio and VS Code make it easy. Remember option lets you specify these and additional settings for the
the “upload” icon in the Azure Functions explorer shown new Function App. You can also modify settings in the Azure
in Figure 2? As long as you’re logged into your account, Portal after the fact.
there’s not much to deploying this function. In my case, I’ll
need to ensure that the Function App is created first and After the extension creates the new Function App, it zips up
then the WordCount function inside of it. Also, keep in mind the compiled function and pushes it up to Azure. You’ll get
those local settings I pointed out earlier. For this beginner some status reports and then a notification when it’s done.
function, I didn’t do much that involved settings, such as At the end, you’ll be prompted to upload the settings. I
define bindings, provide connection strings, or credentials. didn’t need to do that, so I just closed that prompt window.
You’ll get a chance to upload your local settings at the end
of the deployment process. Because it’s a zip file deployment, the function code will be
read-only in the portal. You’ll get a message about that with
Go ahead and click the upload button. You’ll follow a series guidance to change an app setting if you want to edit directly
of prompts as you did when creating the function. First, cre- in the portal. Essentially, if you’re creating these in VS Code
ate a new Function App in Azure (as opposed to adding it or VS, the assumption is that you will make any changes in
to an existing function app). You’ll see that there’s also an your IDE and then re-deploy the updated function.

26 Get Started with Serverless Azure Functions codemag.com


My function was published into the new Function App and Another way to make sure the output binding is discover-
the notification tells me the URI is https://codemagfunc- able is to return the value. But I am already using return to
tions.azurewebsites.net/api/WordCount. You can continue send the HttpResponse. The third way is to use an ICollector
to use the REST Client or your HTTP tool of choice to test object. This is also needed if you want to send back multiple
out your live function. Note that I probably won’t leave my documents. But I’m using the ICollector pattern to return a
function public for long to avoid eating up the Azure credits single document, as it solves the problem.
associated with my Visual Studio subscription.
After my code has received the populated DocObject in the
To test from VS Code, I added a new POST command to the docResults, I’ll add that to the docs ICollector object:
WordCountTester.http file that points to the new URL. Then
I was able to select that full command (lines 8 through documents.Add(docResults);
13), press F1 and run the REST Client Send Request com-
mand again. The extension only ran the request I selected That’s all I need to do. The output binding takes care of
in the file. Figure 7 shows the new POST message and the the rest which is one of the truly amazing benefits of the SPONSORED SIDEBAR:
response. bindings. Now when I run my function, not only do I get Moving to Azure?
the HttpResponse, but the document shown in Figure 8 was CODE Can Help!
Next Steps: Try Out Some Bindings! added to my Cosmos DB database (which, by the way, was
I originally started my Azure Functions journey by build- created on the fly thanks to the CreateIfNotExists setting). Microsoft Azure is a robust
ing and testing them directly in the portal where you can and full-featured cloud
easily add trigger, input, and output bindings by clicking platform. Take advantage
on UI elements. In the same time frame that my compre- Summing Up of a FREE hour-long
hension increased of how the functions worked, the Azure There’s so much more to learn about using Azure Functions CODE Consulting session
Functions extension for VS Code also evolved. I eventually and how you can use them as part of your production solu- (yes, FREE!) to jumpstart
transitioned to using VS Code with the extension to build, tions, not just little demo apps. I love working with tools your organization’s plans
debug, and deploy the functions. I’ve also created a number that aren’t just productive but are a joy to use; the combina- to develop solutions on
of functions that read data from Azure Cosmos DB with input tion of Azure Functions, Visual Studio Code, and the Azure the Microsoft Azure platform.
bindings, store data into Cosmos DB with output bindings, Functions extension definitely falls into this category! For more information visit
and even sent text messages with an output binding for the www.codemag.com/
consulting or email us
Twilio API. You can see all of this in action in a recorded con-  Julie Lerman
at info@codemag.com.
ference session I gave at Oredev in 2018 which is available 
on YouTube at https://youtu.be/fp9bB3L5utM.

I won’t detail how to do this in this already lengthy article,


so here’s a quick look at what the bindings look like for
a Cosmos DB trigger function. This does require an Azure
account and an existing Cosmos DB account. See this docu-
mentation to create a Cosmos DB account: https://docs.
microsoft.com/en-us/azure/cosmos-db/how-to-manage-
database-account. In C#, the bindings are described, like
the trigger in our WordCount function, as attributes of the
Run method in code. This is quite different from how they’re
configured in a JavaScript or C# script-based function where
the configurations live in a JSON file.

I’ve modified the WordCount function’s Run method to in-


clude a CosmosDb output binding attribute that will store
the documents into a container called Items in my database
with a connection string now defined in local.settings.json:

public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous,
“get”, “post”, Route = null)]
 HttpRequest req)
[CosmosDB (
databaseName: “WordCounts”,
collectionName: "Items”,
ConnectionStringSetting =
"CosmosDBConnection”,
CreateIfNotExists=true)]
ICollector<doc> docs)
{

There are three ways to ensure that the data intended for
the output binding are discovered by the binding. One is to
create an Out parameter in the signature. Because I’m us-
ing an asynchronous method, that’s not possible with C#.

codemag.com Get Started with Serverless Azure Functions 27


ONLINE QUICK ID 1911061

Women in STEM, an Interview


Sumeya Block is a high school student who’s discovered that coding is creative, builds communities, and provides an excellent
platform for activism. Through her explorations, she’s made a good friend of JavaScript coder Sara Chipps. The two of them
interviewed each other and they’re letting us listen in. During this interview, we learn more about Sara’s own encounters,

advice, and work, and how what she had to say was inspiring ized how much I loved computing and computers because
to Sumeya—and can be to all of us. And we can also pick up they could make communities happen. I knew I was going to
that sense of wonder and excitement from Sumeya’s infec- be a computer programmer in my senior year of high school.
tious interest. Here’s Sumeya’s introduction to the interview. I took a C++ class with a teacher named Mrs. Gaul, and for
the first time, I felt like the computers thought the same
“When I think of inspiring women who are making a difference way that I thought—very logically.
in the tech world, a few women come to mind. One is Sara
Chipps, JavaScript lover, and co-founder of Girl Develop It, Do you still see sexism and discrimination in the work place?
where women can learn computer programing skills online.
Sumeya Block She’s currently at Jewelbots (which she also co-founded). I definitely experienced it when I first started my career, I
Jewelbots launched on Kick Starter just over six years ago know a lot of women who ended up leaving the industry be-
Sumeya is a passionate and, since then, has been committed to getting girls inter- cause of it. I think the positive thing now, being on this side
writer, lover of creative ested in STEM fields. Jewlbots currently sells two projects. One of my career, is that I can help mentor younger women and
expression, and a recent is a JewelBits science kit that sparks creativity through DIY I can step in. Now that I’m older, I can step in when I see it
Teen Tix press corps writer. neon-colored light-up signs. The second is a programmable happening to other women. I think it’s important that we’re
She’s currently in her friendship bracelet that can be used to talk to friends through all aware of and keep our eyes on these things. 
sophomore year of high Morse code. It can light up when paired with other bracelets
school and spends most of
and do even more as the users develop their coding skills. Before you started Jewelbots, you were in Girl Develop It.
her time going to poetry
How does your work in both organizations help to encour-
slams, writing art reviews,
“When I called Sara on a rainy New York night, she talk- age more diversity in STEM fields?
and speaking at events.
She’s been published in ed admiringly about the women she works with and men-
The Evergrey, Teen Tix blog, tors. She talked about how she and others (not just other The thing they both have in common is helping to teach
and in the Poetry on Busses women) can support their female coworkers by stressing the women and girls that coding isn’t something that’s impos-
contest. She has also pre- importance of reaching out and sharing opportunities. Talk- sible to achieve. It can be something that’s fun and power-
sented at CPP con and .NET ing to Sara, I learned about why she continues working to ful. Often, I heard from women in Girl Develop It classes
Fringe. When Sumeya isn’t create spaces for girls and women to learn about STEM. I that they didn’t know what an engineer was until they got
running around, she enjoys learned more about her own encounters, advice, and work. to college, and by then they felt it was too late to learn or
bingeing Netflix, reading And what she had to say was inspiring.” take the classes needed. The interesting thing about what I
books, and attending social do now with Jewelbots is to help encourage younger girls.
events.
Sara’s Answers as Asked by Sumeya Would you say that the environment has changed since
When did you first become interested in tech and was those first girls became women? Is it the same for kids now?
Sara Chipps there a moment where you knew you were going to be a
sarajchipps@gmail.com computer programmer? I’ve really seen a push to get more girls involved at a young-
er age, and I think that’s really important. It’s important
Sara’s an engineering I was around 11 or 12. This was before the Internet existed, that we help girls understand that this is something that is
manager at Stack Overflow and there were these things called BBSs (bulletin board sys- for them, it’s something that can really help their lives, that
and the cofounder of http:// tems), that were linked to your computer and were like early it’s something that they can really have fun doing.
Jewelbots.com. Sara was chat rooms. I used to hang out on those a bunch and real-
formerly the CTO of http:// Jewelbots has really changed and it’s developed into a re-
FlatironSchool.com and in ally great community of girls. For me, that was something
2010, she cofounded Girl I always loved when I was first learning. As the CEO, how
Develop It, a non-profit
does the process of developing a product and working
focused on helping women
with beta testers like me change how you work?
become software developers.
I learned a ton! It really gave me respect for people who do
product management and things that aren’t strictly engi-
neering. Something I learned really early on is that a lot of
assumptions that you make about a product can be wrong.
Just because I happen to remember what it’s like to be a
girl doesn’t mean that now, 20 years later, I understand
what girls want today. One thing that it’s really taught me is
to not make assumptions or pretend that I can understand
what someone else might be facing just because I think I
can imagine it or I think I can remember it from a long time
ago. No matter what, it’s really important to talk to people
and see what you are building with a product.

28 Women in STEM, an Interview codemag.com


Is there anything you’ve changed about Jewelbots based That’s one of my favorite questions. The age group is 8-14
on feedback received? because when I started talking to my peers who were cod-
ing, my male peers, I started asking them why did you get
Initially, we were going to make a bracelet that could into coding and your sister didn’t? What’s happening here?
change colors to match your outfit instead of a friendship Why did you find this when the girls in your life did not?
bracelet. I thought that was a great idea! I thought that I Usually what I heard is that they were in middle school or
remembered what it was like to be ten, and I know I would elementary and they found a game, got really into gaming,
have loved that. But when I started talking to some girls and decided “when I grow up, I want to be a game devel-
that age, they were like “that sounds really boring, I don’t oper.” Additionally, if you look at the research, you can see
think I would do that.” I was just like wow, it’s so good we that somewhere in their preteen years is often when girls in
talked to people before making this whole thing that they western culture and the US start thinking about math and
wouldn’t have liked. science as things that aren’t really for them. So that’s why
we really wanted to aim for this age group, right at the same
Recently, Jewelbits was released. What is it? What can time they might be thinking that math and science aren’t
you do with them? for them. We want to really reach them with products that
show them that math and science ARE for them.
Jewelbits are STEM-themed craft boxes where you can learn
certain science concepts. The first box is a Hello World Neon The Jewelbots YouTube has tons of different challenge
Box. It gives you all the components you need to make a neon- videos. What do you like about them?
colored sign that lights up different colors. The point of the I like when the challenge videos focus on friendship stuff and
boxes is to introduce other STEM concepts. They are also doing cool colors with friends. I really like that because it re-
less expensive than Jewelbots. One thing we heard from quires more than one person interacting, which is super fun.
parents a bunch as we were selling Jewelbots is that the
price point is a little high. That it’s not something that some What do readers of CODE Magazine need to know in order
people could afford to do, you know if it wasn’t a birthday to empower their female coworkers and young girls to keep
or Christmas. The one thing we set out to do is make some- pursuing fields of STEM and to feel encouraged to do so?
thing that’s more accessible pricewise and still delivers the
great STEM content and education that Jewelbots does. Our The best thing that people who already have a career can do
next box is Hello World Lava, full of lava beads that you can is sponsoring. That means not just being a mentor to them,
make friendship bracelets with for your friends. It’s filled but also giving them opportunities that may come to you.
with tassels and letters, and the lesson is all about lava from For example, if someone is recruiting you for a job or to
volcanos. It’s about how hot lava gets, where it comes from, speak at a conference, if it’s something you’re not going to
and how volcanoes work. do or even if it’s something you want to, helping someone
that you’re mentoring into that opportunity is a great way
Q: How often are Jewelbits going to be released? to sponsor them throughout their career. That might mean if
you know a young woman who’s studying computer science,
We plan to release a box every few months. So far, people just make yourself available for questions and or talking
have been having a lot of fun with them and making really things through. It’s important to make sure that they have
cool neon-colored signs. I’m excited about that. an opportunity on the other side of school, too.

That sounds really fun! Are these lessons also being How should people reach out?
taught in science classes? What’s special about using
these boxes, as opposed to learning about it in school?  Anyone can reach out and just say, “how can I be helpful? or
ask, “are you job hunting?” “Are you practicing for interviews?”
That’s a really good question. Often, these are things that “Are you facing anything at work that you could use some help
people are learning in science class. The difference here is with from someone with more experience?” Just making yourself
learning through play, something we at Jewelbots believe in available and saying, “how can I help?” is a great way to do it.
a lot. I was an okay student, but when I really cared about
something and it was something that I could play with and
have fun with, those are the things that I really remember Sumeya’s Answers to Sara’s Questions
now, 20 years later. The goal is more education and a better Why is coding important to you?
grasp and understanding of concepts through play.
It’s really important to me because I know that in this society,
Jewelbots has its own YouTube channel. In your opinion, coding has definitely become prominent in general. Especially
why are these videos important and how do they create a in my age group, technology is just so prominent and I re-
community? Why is a community so important?  ally like coding as a great way to be creative. I haven’t really
been able to do a lot of it since I started high school because
Community really helps with learning, whether it’s the You- high school is very demanding. But what I’ve always loved is
Tube community or any other type of on- or off-line com- the creativity about it. The community of getting to share the
munity. People really respond when they see other people things you’ve done with other people. I think it’s important
their own age doing the same things they are. I think that’s to know for the future because, like I said, it’s just so integral.
a really neat thing about kids and girls.
What’s the first thing you ever coded?
On the Jewelbots website, it’s stated that your age market
is 8-14-year-old girls. What’s so special about this particu- I’m pretty sure the first thing I coded was with my dad. We
lar age group? Why is Jewelbots targeted at this audience? programmed this series of colors with the Jewelbots brace-

codemag.com Women in STEM, an Interview 29


let. And it was my first time really getting into coding be- you’re using. I think the other ways that I’m going to be using
cause in my classes, I did one of those websites where it’s coding in my activism is creating social media. Social media
not really coding but you just learn the basics of it. I got to is a really great way to carry all these messages across that
do that, but then with Jewelbots, I was able to do more raw we want to talk about, informing people and using freedom
coding—like going into Arduino IDE and actually putting of speech to share ideas and communicate between different
the language in, and I just programmed a series of colors. places. I think coding social media is a really great way to do
I remember that was so exciting to me because I got to see networking and that’s how it could be involved in activism
what I created and I was like, “Oh wow, there’s a lot I can and really working to empower everybody to learn.
do with this.” And I told all my friends about it. I remember
it was a really great bonding experience for me and my dad Do you feel that your peers are interested in coding?
because he’s a computer programmer who really loves pro-
gramming and he got to share that with me. So yeah, the Yes, definitely. I don’t know if it’s like that all over the coun-
first thing I coded was a rainbow series of flashing lights to try: I was just talking to my mom about that because I live
match the outfit I was wearing. And then later on, I learned in a very tech-centered city. In my school, we have a coding
how to do the actual friendship coding aspects. club and it’s full. All of my friends who are girls go to it.
All of my guy friends go to it—everybody goes to it and it’s
Can you tell me a bit about the activism work that you’ve really great because when a computer programming class
done in the past? opened, no question, all of my friends signed up for it. I
think what inspired them is that here, a lot of opportuni-
I used to run a middle school youth group called “Besought ties have been pushed to get girls into coding. Ever since I
Youth.” It was aimed at creating spaces for Muslims ranging was in fourth grade, we’ve had a lot of people come to talk
from 12-14 years old, to just talk about being Muslim and to us about tech and I think what’s really great is that the
really anything that they want to talk about. I didn’t have schools I’ve been to have taken care to represent that there
a lot of friends who were my age who were Muslim. And I are males who work in coding and also females who work in
didn’t always feel welcome to be able to speak up during coding. There are women in tech who work really hard.
things because adults sometimes are like, “No, you’re the
kid. You need to sit down.” So it was really great because it We’ve had several instances at my school where they took
was an environment where we kids could talk and we did a us out of class to go to workshops to hear about women in
lot of community-based work when I was running it, which STEM. For example, many of the girls in my class went to
I no longer am because I’m in high school now. But when I Nordstrom’s when they held a workshop and they talked all
was, we held a food drive and in the middle of the year, we about entrepreneurship—coming from women from all differ-
worked on a book drive. ent walks of life and ages. They talked about running websites
and a lot of the things that you don’t know are happening
After that, some other activism that I’ve been doing is I’ve at Nordstrom’s. It was really cool because they all had super
become involved with this really great organization called powerful positions. Even if you’re not interested in coding,
Kids4Peace. I’m in the Kids4Peace local chapter but there you still know that it’s an option. And my friends who are
are chapters all over the world. We work on interfaith con- interested in coding know that they can be developers when
nections. It’s really great because we visit churches, syna- they grow up if they want to be. With Jewelbots, I learned
gogues, mosques, and all of these places of worship. You how important that is and seeing that we’re working to em-
get to learn about all the different religions but then ad- power girls and that it’s really working out and people are
ditionally, we do a lot of activism-based work about equity confident in their skills. That’s really awesome. 
and working against racism, working with women’s rights,
and of course discrimination against religions such as anti- In that vein, what’s your favorite thing that you’ve built
Semitism and Islamophobia. That’s the kind of activism I’ve with coding?
been doing right now. 
I programmed this Pomodoro Timer, and I’ve talked about
That’s so awesome that you’ve done so much and you’re this several times, but it was really cool for me to see “Oh
so young. I bet your parents are super proud. my God, I can change the colors!” Obviously, this was with
Jewelbots. And once I learned even more about coding, I
I think that young people have to start standing up now be- was able to create this program where I could set up a timer
cause there’s a lot of work to do. You can’t just sit down, you and it would go on for five minutes of rest and then 30 min-
know? Everybody I know that’s around my age is really inter- utes of work, and then five minutes of rest and I could actu-
ested in activism and is definitely working to fight for our world ally use that in my homework. For me, that was really excit-
because you know, we have to take it up in like five years. ing because it was a time when I was able to use code in my
own life and incorporate it into my lifestyle. The second best
How do you see coding empowering your career as an ac- thing is the game Catch the Leprechaun, which was really
tivist in the future? fun for me because it was really cool seeing what you could
do with code. That you could actually create a game on a
I definitely think I’m going to include coding in my activism. bracelet that you wouldn’t think you could do that much
I think that the stereotype is “You have to be good at math” with was really exciting. So those are my two favorite things
or “You have to be good at science to do computer program- I’ve built to this day. 
ming” and those are two things I’m not particularly great at. I
really think that I can use my voice to advocate that everyone What is Sumeya’s life like in 2025?
should learn how to code. It doesn’t matter if it’s going to
be part of your profession. It’s really important to learn be- Oh my God. I was actually just talking to my mom about
cause it helps you to understand the world and the tools that this in the car and I was like, “Oh, that’s like 10 years away”

30 Women in STEM, an Interview codemag.com


because I’m bad at math. And I’m like wait. Oh my God. It’s
SPONSORED SIDEBAR:
2019. I’m going to be a junior in college in like five years.
®
That’s terrifying. My mom is hoping I go to UC Berkeley, Need FREE Project
which would be fun because I love San Francisco. I think Advice? CODE Can
that what I’ll be doing is something with writing and jour- Help!

Instantly Search
nalism. I really love journalism. I think that I’ll be in college
and definitely advocating for women, and I hope that I’ll be How does no strings, free

Terabytes
going to conventions, continuing my path of learning about advice on a new or existing
coding because I think it’s important that everyone learns project sound? Need free
about it. And I also hope to continue my work with activ- advice on migrating an
ism, empowering women and empowering all people, mak- existing application from
ing sure to amplify the voices to whom injustices happen so an aging legacy platform
they can be heard and their needs met. to a modern cloud
or Web application?
The people at CODE Magazine want to know: What is the
CODE Consulting experts dtSearch’s document filters
best way to inspire and excite young women to code?
have experience in cloud, support:
Web, desktop, mobile,
microservices, and DevOps • popular file types
Let’s see, something that I really love is just seeing these
opportunities that I’m getting and that my friends are get-
and are a great resource • emails with multilevel
for your team! Contact us
ting, and I really hope that’s happening across the nation. today to schedule your free
attachments
I think when code is talked about and celebrated, it’s defi- hour of CODE consulting • a wide variety of databases
nitely more exciting. Girls who learn about coding are more
like, “Oh, this is like a real career.” I don’t think everybody
call with our expert • web data
consultants (not a sales
knows exactly what coding is. I’ve talked to people who call!). For more information,
don’t understand how much of a great opportunity it is. I visit www.codemag.com/
think one way that people who read CODE Magazine can help consulting or email us at Over 25 search options
to empower girls and women is to see if young people can info@codemag.com. including:
come in and learn about coding and learn about what you
do at your workplace. Some of the things that hinder people • efficient multithreaded search
is not knowing what coding is and thinking that they’re not • easy multicolor hit-highlighting
empowered to do it. And when you don’t learn, you might be
• forensics options like credit
like “Oh this is really cool. But it feels kind of out of reach.”
Going to someone’s workplace and seeing that they’re cod-
card search
ing and they’re doing all these things, working or running
a business, I think it definitely shows that you can do it and
you can be inspired to do it. The best thing people can do Developers:
is just inspire us all to work really hard—all young people,
not just women and girls—to get to our goals and to learn • SDKs for Windows, Linux,
about coding. macOS
• Cross-platform APIs for
 Sumeya Block, Sara Chipps .
C++, Java and NET with
 . .
NET Standard / NET Core
• FAQs on faceted search,
granular data classification,
Azure and more

Visit dtSearch.com for


• hundreds of reviews and
case studies
• fully-functional enterprise
and developer evaluations

The Smart Choice for Text


Retrieval® since 1991
1-800-IT-FINDS
www.dtSearch.com

codemag.com Women in STEM, an Interview 31


ONLINE QUICK ID 1911071

Stone Soup: Cooking Up


Custom Solutions with
SQL Server Machine Learning
This article describes the machine learning services provided in SQL Server 2017, which support in-database use of the Python
and R languages. The integration of SQL Server with open source languages popular for machine learning makes it easier to
use the appropriate tool—SQL, Python, or R—for data exploration and modeling. R and Python scripts can also be used in

T-SQL scripts or Integration Services packages, expanding • Your data science tools can connect securely to the
the capabilities of ETL and database scripting. database to develop models without duplicating or
compromising data.
What has this to do with stone soup, you ask? It’s a meta- • You can save trained models to a database and gener-
phor, of course, but one that captures the essence of why ate predictions using customer data and leave optimi-
SQL Server works so well with Python and R. To illustrate zation to your DBA.
the point, I’ll provide a simple walkthrough of data explora- • You can build predictive or analytical capacity into your
tion and modeling combining SQL and Python, using a food ETL processes using embedded R or Python scripts.
and nutrition analysis dataset from the US Department of
Jeannine Takaki-Nelson Agriculture. Let’s look at how it works and how the integration makes it
j.takaki@live.com easier to combine tools as needed.
@jrrnt Let’s get cooking!
The article is targeted at the developer with an interest in
While at Microsoft, Jeannine machine learning (ML), who’s concerned about the com-
worked as a tester and Machine Learning, from Craft to Pro plexity of ML and is looking for an easier way to incorporate
wrote technical documenta-
You might have heard that data science is more of a craft ML with other services and processes. I’ve chosen “stone
tion for machine learning
products, including SQL than a science. Many ingredients have to come together ef- soup” as a metaphor to describe the process of collabora-
Server Data Mining, ficiently, to process intake data and generate models and tion between data scientists and database professionals to
SQL Server Machine Learning, predictions that can be consumed by business users and end brew up the perfectly performant ML solution.
and Azure Machine Learn- customers.
ing Studio. She’s currently Security Architecture
retired, which gives more However, what works well at the level of “craftsmanship” of- First off, let’s be clear about the priorities in this platform:
time to read about data ten has to change at commercial scale. Much like the home security, security, and security. Also, accountability, and
science and run really inef- cook who has ventured out of the kitchen into a restaurant management at scale.
ficient code. She’s grateful or food factory, big changes are required in the roles, ingre-
to the many writers in dients, and processes. Moreover, cooking can no longer be Data science, like cooking, can be tremendous fun when
the R-blogger and SQL a “one-man show;” you need the help of professionals with you’re experimenting in your own kitchen. Remove vari-
Server community for different specializations and their own tools to create a suc- ables, mash data into new formats, and no one cares if the
their excellent examples cessful product or make the process more efficient. These result is half-baked. But once you move into production, or
and gentle explanations. specialists include data scientists, data developers and tax- use secure data, the stakes go up. You don’t want some-
onomists, SQL developers, DBAS, application developers, one contaminating the ingredients that go into a recipe or
and the domain specialists or end users who consume the spying on your data and production processes. So how do
results. you control who’s allowed in the kitchen, when you can’t
have just anyone involved in preparing your food or touch-
Any kitchen would soon be chaos if the tools used by each ing your data?
professional were incompatible with each other, or if pro-
cesses had to be duplicated and slightly changed at each With ML in SQL Server, security and management is enforced
step. What restaurant would survive if carrots chopped up at four layers (see Figure 1):
at one station were unusable at the next? Unfortunately,
the variety (and sometimes incompatibility) of tools used • Isolation of Python or R processes: When you install
in data science means that a lot of work has had to be rein- the ML service, the database server gets its own local
vented or created ad hoc and left unmanaged. For example, instance of Python (or R). Only a database adminis-
ETL processes often create data slices that are too big for trator or someone with the appropriate permissions
analysis or they change core aspects of the data irreparably. can run scripts or modify installed packages. (No more
installing packages from the Internet on a whim.)
The core business proposition of integrating Python and R • Secure lockdown of Python launcher: The stored pro-
with SQL and the RDBMS is to end such duplication of ef- cedure that calls the Python (or R) runtime is not en-
fort by creating commercial-strength bridges among all the abled by default; after the feature has been installed,
tools and processes. an administrator must enable external code execution

32 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning codemag.com
at the server level, and then assign specific users the From the standpoint of the DBA, drawbacks include not just
permissions to access data and run the stored proce- the crazy data scientists asking for Python installs, but new
dure. workloads. The administrator must allocate server resources
• Data access: Managed by traditional SQL Server se- to support ML workloads, which can have very, very differ-
curity. To get access to data, you must have database ent performance profiles. ML also uses new database and
permissions, either via SQL login or Windows authen- server roles to control script execution as well as the ability
tication. You can run Python or R entirely in the server to install Python or R packages. Other new tasks for the DBA
context and return the results back to SQL Server include backing up your ML data, along with your data sci-
tables. If you need more flexibility, data scientists ence users and their installed libraries.
with permission to connect to the database can also
connect from a remote client, read data from text files The SQL Server development team put a lot of effort into
stored on the local computers, and use the XDF file figuring out workflows that support data science without
format to make local copies of models or intermedi- burdening the DBA too much. However, data scientists who
ate data. lack familiarity with the SQL Server security model might
• Internal data movement and data storage: The SQL need help to use the features effectively.
Server process manages all connections to the server
and manages hand-offs of data from the database to Package Installation and Management
the Python or R processes. Data is transferred between Security is great, but the data scientist needs to be able to
SQL Server and the local Python (or R) process via a install open source Python or R packages. Moreover, they
compressed, optimized data stream. Interim data is expect to install those new packages and all their depen-
stored in a secure file directory accessible only by the dencies straight from the Internet. How does this even work
server admin. in a secured environment?

Whereas data science used to be a headache for control- First off, the default installation includes the most popular
minded DBAs, the integrated ML platform in SQL Server packages used for data science, including nltk, scikit-learn,
provides room for growth, as well as all the monitoring and numpy, etc. SQL Server also supports installing new pack-
management required in a commercial solution. Compare ages and sharing packages among a group of data scien-
this to the old paradigm of exporting customer data to a tists. However, the package installation process is restricted
data scientist to build a model on an unsecured laptop. to admins and super users. This is understandable because
Add in the SQL Server infrastructure that supports monitor- new Python or R libraries can be a security risk. Also, if you
ing—who viewed what data, who ran which job, and for how install version x.n of a Python package, you risk breaking
long—infrastructure that would be complex to implement in the work of everyone who’s been using a different version
an all-Python or R environment. of the package.

For details on the new services, and the network protocols


used to exchange data between Python and SQL Server, I
recommend the articles listed in Table 1 from Microsoft:

Now that I’ve touted the advantages of the platform, let’s


look at some of the drawbacks:

From the standpoint of the data scientist (the freewheeling


home cook, if you will), the framework is far more restric-
tive. You can’t install just any Python or R library onto the
server. Some packages are incompatible with a database en-
vironment, and often the package you need isn’t compatible
with the version installed with the server.

Some standardization and refactoring of your Python or R


code will also be required. Just as in your commercial kitch-
en, where vegetables have to be diced to a particular size
and shape or else, your data has to match SQL’s require-
ments. You can’t dump in just any Python code, either. Code
typically has to be rewritten to work inside a SQL Server
stored procedure. Usually this work is trivial, such as get-
ting tabular data from SQL Server rather than a text file and
avoiding incompatible data types. Figure 1: Four layers of security and management for Python

Resource Link
Introduction to the extensibility framework https://docs.microsoft.com/ sql/advanced-analytics/concepts/extensibility-framework?view=sql-server-2017
Network protocols and how Python is called from SQL Server https://docs.microsoft.com/sql/advanced-analytics/concepts/extension-python?view=sql-server-2017
Security details at the database level https://docs.microsoft.com/sql/advanced-analytics/concepts/security?view=sql-server-2017

Table1: Security and architecture resources

codemag.com Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning 33
Therefore, a database administrator typically has to perform Management, Optimization, and Monitoring
or approve the installation. You can install new packages If you’re a database professional, you already know how
on the server directly if an admin gives you permissions to to optimize server performance and have experienced the
install packages. After that, installation is as easy or hard challenges of balancing multiple server workloads. For ML,
as any other Python install, assuming the server can access you’ll want to make full use of your DBA’s knowledge in this
the Internet. Whoops. Fortunately, there are workarounds area and think hard about server allocation. But you’ll also
for that too. need to lean hard on your data scientist.

The SQL Server product team has thought long and hard Let’s start with the basics. Calling Python (or R) does add
about how to enable additional packages without break- processing time. Like any other service, you’ll notice the lag
ing the database, annoying the DBA, or blocking data sci- the first time the executable is called, or the first time a
entists. Package management features in SQL Server 2017 model is loaded from a table. Successive processing is much
let the DBA control package permissions at the server and faster, and SQL Server keeps models in cache to improve
database level. Typically, a super user with the correct da- scoring performance.
tabase role installs needed packages and shares them with
a team. The package management features also help the If you set up some event traces, you might also detect small
DBA back-up and restore a set of Python libraries and their effects on performance from factors such as:
users. Remote installation is also supported for R pack-
ages. • Moving data, plots, or models between a remote client
and the server
Because this feature is complex, I won’t provide more de- • Moving data between SQL Server and the Python or R
tails here. Just know that in a secure server, there are nec- executables
essarily restrictions on package installation. Table 2 lists • Converting text or numeric data types as required by
some great resources. Python, R, or the RevoScale implementations

Some caveats before I move on: (For the nitty-gritty details of performance, I strongly rec-
ommend the blog series by SQL Server MVP Niels Berglund
• Azure SQLDB uses a different method for managing on Machine Learning Services internals: https://nielsber-
packages. Because multiple databases can run in a glund.com/2018/05/20/sp_execute_external_script-and-
container, stricter control is applied. For example, sql-compute-context---i/)
the SQL Server ML development team has tested and
“whitelisted” R packages for compatibility and use in Considered as a platform, SQL Server Machine Learning of-
Azure SQL DB. At this time, the R language is the only fers a lot of options for optimization. Several of the most
one supported for Azure SQL DB. important use cases have been baked into the platform. For
• There is no comparable “whitelist” of Python packages example, native scoring uses C++ libraries in T-SQL (https://
that are safe to run on SQL Server. ML in the Linux docs.microsoft.com/sql/advanced-analytics/sql-native-
edition of SQL Server is also still in preview. Watch the scoring?view=sql-server-2017) to generate predictions from
documentation for more details. a stored model very fast. Optimized correctly, this feature
can generate as many as a million predictions per second
(see: One million predictions per second: https://blogs.
technet.microsoft.com/machinelearning/2016/09/22/pre-
dictions-at-the-speed-of-data/).

The key phrase is “optimized correctly.” To get the super-


performance goodies, you really need to know something
about server configuration, SQL Server optimization, the al-
gorithms and distributed computing features in RevoScaleR/
revoscalepy, and, of course, some basic R or Python optimi-
zation. That’s a tall order, and it’s another reason you ben-
efit from having multiple contributors to your stone soup.

For example, ML workloads can have very different pro-


files depending on whether the task is training or scoring,
which algorithm has been used, and how big or wide the
Figure 2: Optimization depends on both data size and data is (see Figure 2: Optimization for data size vs. model
model complexity complexity).

Resource Link
Package management roles https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/05/11/enterprise-grade-r-package-management-
made-easy-in-sql-server/
Using sqlmutils to install packages remotely https://docs.microsoft.com/sql/advanced-analytics/package-management/install-additional-r-packages-on-sql-
server?view=sql-server-2017

Table 2: Package management resources

34 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning codemag.com
Task Description and link to resource
Optimize Windows server and Although this case study was originally for R, most of the tips apply to Python models as well.
SQL Server The experiment compares a solution before and after server optimizations such as use of NUMA and maximum parallelism.
https://docs.microsoft.com/sql/advanced-analytics/r/sql-server-configuration-r-services?view=sql-server-2017
Be sure to catch this part of the series, which covers use of compression and columnstore indexes:
https://docs.microsoft.com/ sql/advanced-analytics/r/sql-server-r-services-performance-tuning?view=sql-server-2017
Optimize for concurrent The Microsoft Tiger Team captures real-world customer problems and periodically distills them into useful blogs.
execution https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2016/09/20/tips-sql-r-services-optimization-for-concurrent-
execution-of-in-database-analytics-using-sp_execute_external_script/
Choose models and data There are many ways that the RevoScale platform can improve performance: enhanced numerical computation, streaming and
processing methods batching, parallel algorithms, and pretrained algorithms. This guide to distributed and parallel computing provides a high-level
introduction to the types of distributed computing provided by the RevoScale algorithms.
https://docs.microsoft.com/machine-learning-server/r/how-to-revoscaler-distributed-computing
Use pretrained models Talk about a shortcut—the pretrained models in microsoftml (for Python and R) support sentiment analysis and image
recognition, two areas where it would be impossible for most users to get and use enough training data.
https://docs.microsoft.com/sql/advanced-analytics/install/sql-pretrained-models-install?view=sql-server-2017
Manage SQL resources Resource Governance is an awesome feature for helping manage ML workloads, although it’s available only with Enterprise Edition.
https://docs.microsoft.com/sql/advanced-analytics/administration/resource-governance?view=sql-server-2017
Optimize for specific As noted earlier, fast scoring is particularly important for enterprise customers. There are lots of ways to accomplish this based
high-priority tasks on whether you are using a single server or distributed servers and even a Web farm.
https://docs.microsoft.com/sql/advanced-analytics/r/how-to-do-realtime-scoring?view=sql-server-2017
https://docs.microsoft.com/machine-learning-server/operationalize/concept-what-are-web-services

Table 3: Performance optimization resources

Your big data might require minimal resources if processed


in parallel or by using batches, compared to a neural net-
work model using lots of features, or even a small data-
set with an algorithm that must keep the entire dataset in
memory until processing is complete.

If you’re curious about the performance characteristics of


a particular type of model, the ML literature these days is
chock-full of interesting research on which algorithm is bet- Figure 3: Versions of SQL Server that support machine learning
ter or faster, how much data is enough or better, and what
constitutes complexity. You can even find cheat sheets spe-
cific to a type of algorithm; such as a comparison of the dif- and Agriculture and represents a summary of food stamp
ferent implementations of decision trees or neural networks spending across the nation.
in terms of feature size, processing capacity, etc. Table 3 is
a good start on these resources. Finally, I’ll build a simple visualization using a Python package.

The key to success is capturing a baseline so that your DBA Prepare the Environment
and your data scientist can begin the process of optimi- For specific set up steps, see the Microsoft documentation.
zation—figuring out which processes are taking the most Links to all the pertinent set up docs are provided in Table
time, and how you can adjust your data sources and code to 5, near the end of this section. Set up of the server takes
streamline performance. The goal here is simply to provide about an hour and ditto for the client tools.
a set of starter resources that you can use to begin to opti-
mize a workload in SQL Server Machine Learning. Choosing which features to install, and which version, is the
first step. What features you install depend on what version
is available, and what you will be doing with ML. Figure 3
Python and SQL: A Walkthrough summarizes the versions of SQL Server that support ML.
Let’s get cooking! For this walkthrough, the goal is simple—
to get the server and client tools set up and learn the basics For this demo, I installed Developer Edition for SQL Server
of the stored procedure sp_execute_external_script. 2017, because Python became available starting in 2017.

The first two sections cover basic set up of Machine Learning You can use a laptop or any other personal computer with
Services on SQL Server, as well as set up of R or Python client sufficient memory and processing speed, or you can create
tools. If you already have SQL Server 2017 installed, includ- an Azure Windows VM. Remember, you need to meet at least
ing the ML features, you can skip the first part. the minimum SQL Server requirements and then have ex-
tra memory to support the Python or R workloads. Such an
In the third and fourth sections, I’ll explore a simple data environment will let you try out all the features, including
set. The data was obtained from the US Department of Food passing data between R and Python.

codemag.com Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning 35
Setup type Link
Set up SQL Server with Python https://docs.microsoft.com/ sql/advanced-analytics/install/sql-machine-learning-services-windows-install?view=sql-server-2017
Python client: https://docs.microsoft.com/sql/advanced-analytics/python/setup-python-client-tools-sql?view=sql-server-2017
R client https://docs.microsoft.com/sql/advanced-analytics/r/set-up-a-data-science-client?view=sql-server-2017
Troubleshooting and Known issues https://docs.microsoft.com/sql/advanced-analytics/known-issues-for-sql-server-machine-learning-services?view=sql-server-2017
https://docs.microsoft.com/ sql/advanced-analytics/common-issues-external-script-execution?view=sql-server-2017
What’s different between the versions? https://docs.microsoft.com/ azure/sql-database/sql-database-machine-learning-services-differences

Table 4: Set up and troubleshooting resources

That said, not everyone will need to set up a client, and such
software might not be allowed in certain highly secured en-
vironments. If you can accept the limitations around debug-
ging and viewing charts, you can develop and run every-
thing in SQL Server.

However, there are good reasons to set up a client. One is


that SQL Server Management Studio does not support the R
graphics device and can’t display charts returned from Py-
thon. If you want to view charts, find a client. For model
development or data processing, it’s not critical.

The download that installs the R and Python clients includes


some basic tools, but not a full-fledged IDE. You might want
to install an IDE or hook up an existing IDE.
Figure 4: Multiple stages of set up and development
• Jupyter notebook is included with the Python client
install and offers support for charts.
If you know that you won’t be building models, only running • Spyder is included with the Python client install, but the
predictions, you have many more options. You can build a IDE is not preconfigured to work with the revoscalepy
model on your beefiest server, save it to a table, and then packages.
copy that model to another server or into Azure SQLDB to • If you install another IDE to use as client, such as
generate predictions. Visual Studio or PyCharm, you must create a Python
environment that uses the correct libraries. Otherwise
Before you begin installation, be aware that set up is a your IDE will, by default, use the Python executable
multistage process, which includes reconfiguration of SQL named in PYTHON_PATH. Configuring this and getting
Server to enable the ML features, and possibly changing it to work can be tricky. I used Visual Studio 2019,
firewall or network protocols, followed by an optional client which has nice support for Python.
install, and testing of client connectivity. Figure 4 shows • The client included with the R install is much simpler
these stages. to set up and configure. It’s also relatively easy to run
RevoScaleR from RStudio or other popular IDEs.
Caveats: • If you don’t have administrative rights to the SQL Serv-
• Be sure to choose the Machine Learning Service in SQL er computer, connection from a remote client can be
Server. tricky. See the Troubleshooting Tips and Known Issues
• Do not install the “standalone” Machine Learning list (in Table 4) for current firewall and network issues.
Server. That’s a different product, included in SQL
Server set up mostly for licensing reasons. Basically, Table 4 is a list of some resources for setting up and trouble-
if you have an enterprise license agreement, you can shooting.
install Machine Learning Server on a different com-
puter and use that either as a development suite, or If you have an existing set up, take some time to verify the
for distributed computing on Windows/Linux without version of the Python (or R) executable that is used by SQL
SQL Server. Server. You can run the following code in T-SQL (either in
• After set up, do run all the connection tests suggested SSMS or a remote client like Azure Data Studio) to verify the
in the documentation. And see the troubleshooting version of Python installed on the server:
tips listed in Table 6.
EXECUTE sp_execute_external_script
After the server is installed and ML features have been en- @language = N’Python’,
abled, consider whether you need to install a remote client. @script = N’import sys;
The free client tools from Microsoft basically give you the print(“\n”.join(sys.path))’
same R or Python packages that you run on the server, to
use for testing and developing solutions remotely. You must The version of revoscalepy also must be the same on the
have the client to interact with SQL Server; you can’t run a server and any client you connect from. Run the following
vanilla instance of PyCharm or RStudio. T-SQL code to check the revoscalepy version.

36 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning codemag.com
-- check Revoscalepy version OutputDataSet = Midwest’
EXECUTE sp_execute_external_script WITH RESULT SETS ((col1 nvarchar(50),
@language = N’Python’, col2 varchar(50), Rank1 int,
@script = N’ Amt1 float, Pct1 float, Rank2 int,
import revoscalepy Amt2 float, Pct2 float))
import sys
print(revoscalepy.__version__) It’s not a practical example, but it demonstrates some key
print(sys.version) aspects of running Python (or R) code in SQL Server:

• The stored procedure generally takes T-SQL as input—


Basic Tools and Recipes not from a text file, console, or other ad hoc source.
SQL Server Machine Learning uses a stored procedure to en- • The column names from SQL aren’t necessarily pre-
capsulate all Python (or R) code, as well as some related served in the output, although they’re available to the
services under the covers. All interactions with the Python Python code internally. You can change column names
executables are managed by SQL Server, to guarantee data as part of your R or Python code.
security. You call the stored procedure like you would call • The default inputs and outputs are InputDataSet and
any other SQL command and get back tabular results. This OutputDataSet. All identifiers are case-sensitive. You
architecture makes it easy to send data to ML, or to get back can optionally provide new names for the inputs and
predictions: Just make a T-SQL call. outputs.
• Tabular data returned to SQL Server has to be a
The key requirements are simple: data frame (pandas for Python). Errors are gener- Restrictions on Package
ated whenever you return something that isn’t a data Installation in SQL Server
• Pass in a supported language as nvarchar. frame, even if implicit conversion is sometimes al- Support Security
• Pass in a well-formed script, as nvarchar. With Python, lowed. You can’t use a Python or R
the “well-formed” part can be tricky, as all the rules • The Launchpad service returns error messages and user library no matter where
about spaces, indents, and casing apply even in the other status text to the console (in SQL Server Man- it’s installed or how you call it.
context of the SQL design window. agement Studio, the Messages pane). The error mes- New packages must always
• Provide inputs from SQL Server (typically as a variable, sages are generally helpful although verbose. be installed in the server
query, or view). • Providing a SQL schema for the output data is optional context.
• Align the inputs to the variables in your Python code. but can help your database developer.
• Generate results from Python that can be passed back You might consider these
to SQL Server. You get one and only one tabular datas- A word before I go any further: SQL Server and Python are added hurdles to be a
et back, as a data frame, but can return multiple other kind of like a chain saw and a food processor. Both can blessing, given the headache
values or objects as SQL variables (models, charts, in- process huge amounts of data, but they differ in the way caused by package instability
dividual values, etc.). they chop it up. Python has lists, series, matrices, and other and the proliferation of
structures that often can be converted to a data frame, and Python environments.
For now, let’s assume that you just want to view the data from sometimes can’t. R, although it’s a delightfully flexible lan-
the USDA analysis in SQL and maybe do something with it in guage for data exploration, has its own quirks.
Python. (You can use your own data, but I’ve provided a da-
tabase backup. The dataset is quite small by SQL standards.)

The following code merges two views as inputs and returns SQL Server and Python are kind
some subset from Python. of like a chain saw and a food
EXECUTE sp_execute_external_script processor. Both can process huge
@language = N’Python’ amounts of data, but they differ
, @input_data_1 = N’
(SELECT * FROM [dbo].[vw_allmidwest])
in the way they chop it up.
UNION (SELECT * FROM [dbo].[vw_allsouth]) ‘
, @input_data_1_name = N’SummaryByRegion’
, @script = N’ Such differences can break your code if you aren’t aware
import revoscalepy of them or don’t prepare your code to account for the dif-
import pandas as pd ferences. Be sure to review the data type conversion topics
df = pd.DataFrame(SummaryByRegion) listed in Table 5, as well as the Known Issues in Books On-
Midwest = df[df.Region == “midwest”] line, before you port over existing code.

Resource Description
https://docs.microsoft.com/sql/advanced-analytics/python/python-libraries-and-data-types?view=sql-server-2017 Data type mismatches and other warnings
for SQL to Python conversion
https://docs.microsoft.com/sql/advanced-analytics/r/r-libraries-and-data-types?view=sql-server-2017 Data type mismatches and other warnings SQL
to R conversion
https://docs.microsoft.com/sql/advanced-analytics/r-script-execution-errors?view=sql-server-2017 Issues that apply only to R scripts

Table 5: Known issues and data type conversion

codemag.com Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning 37
Figure 5: Comparison of major food purchases per region

Explore Some Data


You’re welcome to use Adventureworks or any existing data
to play with Python. The dataset provided here is extremely
simple and any similar dataset would do. For this article,
I used nutritional and purchase data related to the food
stamp program.

The Nutrition database (provided as a backup file) was im-


ported from a USDA report on the supplemental nutrition
assistance program, known as SNAP (or food stamps). The
study analyzed food stamp purchases nationwide, and clas-
sified purchases by food and nutrition type, to try to under-
stand if money was being well spent, and how the program
might be improved.

The principal finding by the USDA was that there were no major
differences in the spending patterns of households that used
food assistance vs. those that do not use food assistance. For
example, both types of households spend about 40 cents of Figure 6: Differences between SNAP and non-SNAP
every dollar of food expenditures on basic items such as meat, households for infant formula
fruits, vegetables, milk, eggs, and bread. However, the study
found some ways to improve access to healthy food choices
for SNAP participants. For example, authorized retailers were and R. Fortunately, in the SQL Server 2017 environment,
required to offer a larger inventory and variety of healthy food you’re not constrained to any one tool and can use whatever
options, and convenience-type stores were asked to stock at gets the job done.
least seven varieties of dairy products, including plant-based
alternatives. I decided that it would be interesting to compare the three
regions included in the report. A bar chart might work, but
I’ll do some easy data exploration to find additional insights radar charts are also handy for displaying complex differ-
into food consumption by food stamp users: ences at a glance. A radar chart doesn’t show much detail,
but it does highlight the similarities, and might suggest
• Differences between regions in terms of seasonal veg- some areas a nutritionist might want to dig into, such as
etable consumption, or meat purchases the heavy use of frozen prepared foods. See Figure 5 for the
• Top commodities for each region and for all regions summary of purchases by food type, per region.
• Differences based on age of the head of household
and poverty level of the surrounding community The cool graphic was produced not in Python at all, but in Ex-
cel. The Python library matplotlib includes a function for creat-
Such descriptive statistics have long been the domain of ing a radar chart, but it was pretty complex, and I’m a Python
traditional BI, and there are lots of tools for displaying nice amateur, whereas creating a radar chart in Excel takes only
summary charts, from Power BI to Reporting Services and a few clicks. That’s right; I don’t particularly care which tool
plain old T-SQL, to the many graphing libraries in Python I use, as long as I can explore the data interactively. I don’t

38 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning codemag.com
and the answer was clearly negative. That’s okay. We tend to
expect that brilliant insights will emerge from every set of
data and forget that the original goal of statistical analysis
is to disprove that an effect exists.

A secondary goal was to explore areas where potentially the


program could be modified to assist SNAP users or improve
store offerings. There are lots of ways to do exploratory
analysis and I would have loved to use one of the newer
clustering methods, but the summary data was too sparse.
So let’s fall back on a favorite “broad qualitative method:”
the word cloud.

The wordcloud package in Python makes a handy example


for these reasons:

• It’s not installed by default but it has very few de-


pendencies. All those required packages are already
installed.
• The method for outputting an image file is a useful one
to know. Neither SSMS nor most other SQL tools (such Food is Big Data
as Azure Data Studio) can directly display graphs cre-
Uses for data science are
ated by Python or R. Instead, you output the plot as a
growing fast in the food
binary data stream, which you can use elsewhere. This industry. For example, there’s
have to ask my users to learn Python or code up an interactive method is important because it’s also how you output an app that can take a picture
interface for them; the data is in SQL Server and easily acces- a model and save it to a table for reuse. of your meal and use image
sible to existing tools that my users are familiar with. recognition to tell you how
Step 1. Install wordcloud on the server and client. The many calories and other
As long as I had Excel open, I wanted to try out a new ML default installation of SQL Server Machine Learning includes nutrients it contains. Another
feature in Excel. The Ideas pane, which debuted in Office many of the most popular packages for ML, and to try out ambitious project uses neural
365 in late 2018, takes any tabular data set, and computes the features, you don’t need to install additional packages. networks and crowd-sourced
some quick correlations on every value in every dimension To see a list of all currently installed packages, you can run data to identity possible
in the table. Imagine how complicated that code would be something like this: connections between food
if you had to write it yourself! The Ideas feature returns allergens and diseases.
a series of text boxes listing the most interesting correla- -- view packages in instance library
tions. EXEC sp_execute_external_script
@language = N’Python’,
For example, I created some quick features on the dataset @script = N’
that represent the delta between target groups in terms import pip
of percentage expenditures. Features were generated for import pandas as pd
poverty level in the county, the age of the head of the installed_pkgs = pip.get_installed_distributions()
household, and, of course, the region (Midwest, South, and installed_pkgs_list = sorted([“%s==%s” % (i.key, i.version)
West). Analyzing the total matrix of correlations for these for i in installed_pkgs])
features took about three seconds, and Excel returned 35 df = pd.DataFrame(installed_pkgs_list)
different “ideas.” OutputDataSet = df’
WITH RESULT SETS ((PackageVersion nvarchar(150)))
The Idea presented in Figure 6 tells you that there is a big-
ger difference than expected between SNAP and non-SNAP Assuming that wordcloud isn’t already present, you can in-
households in terms of their purchases of infant formula. stall it on the SQL Server instance by opening a command
prompt as administrator at the folder containing the Py-
The wording is a bit opaque, and you would have to do fur- thon scripts library, typically C:\Program Files\Microsoft
ther analysis to see exactly what this means. But in fact, SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\
this particular correlation was one of the primary findings in Scripts.
the original USDA study, so the results are valid.
pip.exe install wordcloud
Other “ideas” from Excel suggested that ice cream purchases
were higher in southern households than in other regions, To install wordcloud on the client might require a little
and that southern households spent more money on meat more work. On my computer, multiple Python environments
than those in the West or Midwest. What’s behind those dif- complicated the issue to the point where I finally removed
ferences? Ideas is a fun, easy way to start the data explora- all old client tools and old Python versions and installed
tion process. Visual Studio 2019 Community Edition. (Although only in
preview release, it has a nice UI and some improvements in
Predictive Analytics Python support.)
In the original study, the goal of analysis was to determine
whether there were significant differences between con- Ensure that the custom environment uses the downloaded
sumption patterns of food stamp users vs. other consumers, Microsoft Python client, to match the Python libraries on

codemag.com Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning 39
SQL Server, and open a command prompt (not a Python in- Step 3. Create the plot. Having extracted the table of words
teractive window) to run pip as follows: and weights, it’s simple to input the data to sp_execute_
external_script as a view or query, and build a word cloud
python -m pip install wordcloud using Python or R. The Python script has these basic steps:

Step 2. Prepare text data used for the word cloud. There 1. Import the required libraries.
are many ways to create and provide weights to a word 2. Put word data from the SQL query into a data frame.
cloud. To simplify the demo, I used Python to process the 3. Create a word cloud using Python libraries.
list of top commodities for each region and wrote that data 4. Dump the plot objects as a serialized variable using
back to a table. pickle.
5. Save the variable as an output to SQL Server.
Data preparation is another area where the SQL Server plat-
form gives you the ability to use the most convenient, fast- You can see the full text of the stored procedure in Listing 1,
est tool for the job. This data set had very short text entries, but this excerpt shows the key steps:
so I merely concatenated the text and removed nulls, but
you can imagine text data sources where the ability to pro- from wordcloud import WordCloud,
cess data in Python’s nltk and return the tokenized text to ImageColorGenerator
SQL Server would be useful. On a later iteration, I’ll probably
add a stopword list or expand abbreviations. # Handle and prepare data
df = pd.DataFrame(WesternFoods)
INSERT [dbo].[WesternFoods] descriptors = df.MergedText.values
EXECUTE sp_execute_external_script text = descriptors[0]
@language = N’Python’ wordcloud = WordCloud().generate(text)
, @input_data_1 = N’(SELECT Region, plot0 =
[Subcommodity], [CompositeSubcat],[OtherSubcat] pd.DataFrame(data =[pickle.dumps(wordcloud)], columns =[“plot”]
,[SNAP_Pct] FROM [dbo].[vw_FoodListWest])’
, @script = N’ The pattern of creating a complex object and saving it to a
import revoscalepy binary data stream is standard for handling complex struc-
import pandas as pd tures like plots or predictive models. SQL Server can’t under-
df = pd.DataFrame(InputDataSet) stand or display them, so you generate the object, save it as
# prevent Python from inserting None a binary data stream, and then pass the variable to another
df = df.fillna(“”) SQL statement or client or save it to a table.
df[“mergedtext”] = df[“Subcommodity”].map(str)
+ “ “ + df[“CompositeSubcat”].map(str) In the case of a predictive model, you’ll generally save the
print(list(df.columns.values)) model to a table. That way you can save and manage mod-
OutputDataSet = df[[“Region”,”SNAP_Pct”, els, add metadata about when the model was run on how
“mergedtext”]]’ many rows of data, and which prediction runs it was used
for. To see an example of this process for models used in
Note: There’s some slight cost incurred when moving data production, I recommend this tutorial from the Microsoft
between SQL Server and Python, but the pipeline is highly data science team: Python for SQL developers (https://docs.
compressed and optimized; certainly, it’s faster that moving microsoft.com/sql/advanced-analytics/tutorials/sqldev-
data across the network to a Python client. py3-explore-and-visualize-the-data?view=sql-server-2017).

Listing 1: Stored procedure that creates the Western word cloud


USE [Nutrition] import matplotlib.pyplot as plt
GO from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

/*** Object: StoredProcedure [dbo].[mysp_Westernwordcloud] ***/ # Handle and prepare data


SET ANSI_NULLS ON df = pd.DataFrame(WesternFoods)
GO descriptors = df.MergedText.values
ncounts = df.Weights.values
SET QUOTED_IDENTIFIER ON print(type(ncounts))
GO
text = descriptors[0]
print(type(text))
CREATE PROCEDURE [dbo].[mysp_Westernwordcloud]
as # Generate basic plot
BEGIN wordcloud = WordCloud().generate(text)
EXECUTE sp_execute_external_script
@language = N’Python’ # Serialize and output to variable
, @input_data_1 = N’SELECT CAST (ROUND([Weight1] * 100, 0) as INT) as Weights, plot0 = pd.DataFrame(data =[pickle.dumps(wordcloud)], columns =[“plot”])
[MergedText] FROM [dbo].[WesternFoods]’ OutputDataSet = plot0
, @input_data_1_name = N’WesternFoods’ ‘
, @script = N’ WITH RESULT SETS (( plot varbinary(max) ))
import revoscalepy
import pandas as pd END
import pickle GO

40 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning codemag.com
For plots, you need some way to view the object, and there
are several options:

• Save the plot to a local image file, and then copy the
file elsewhere to view it. That way, you aren’t inviting
people to open image files on the server.
• Save the binary object to a table.
• Pass the binary variable to a Python client so that it
can be read and displayed.

I recommend using a Python client to view charts. Configur-


ing the client correctly is critical. To ensure compatibility,
the Python version, the revoscalepy version, and the pack-
age versions must exactly match what’s installed on SQL
Server.

However, the clients provided in the Microsoft download


don’t come preconfigured to use the right libraries; you’ll
need to supply the Python environment definition yourself.
The documentation has guidance on the properties that
need to be supplied in the environment.

Set up a Python client: https://docs.microsoft.com/sql/


advanced-analytics/python/setup-python-client-tools-
sql?view=sql-server-2017. You can see the dialog in Figure 7.

In my case, connections from several tools failed, possibly


because so many old Python clients and environments were
cluttering the system. After uninstalling several Python
IDEs, I installed Visual Studio 2019, Community Edition, as
the client, and configured a custom environment using the
tips in the documentation. It worked great, and Visual Stu-
dio 2019 has improved support for Python, but other Micro-
soft demos have used Visual Studio 2017 and PyCharm. Let
me know what works for you!

Because my client is installed on the same computer as SQL


Server, I also created a custom environment that points to Figure 8: Initial word cloud
the server libraries, as you can see in Figure 7. Typically, Figure 7: Configuring a custom Python environment showing Midwest purchases
you’d never run code using this environment unless you had
a problem you needed to debug.
Listing 2: Viewing the wordcloud from a remote Python client
The code that loads the word cloud in the client as a plot is import matplotlib
shown in Listing 2. I ran the code using the interactive win- import pyodbc
dow, which opens the plot in a separate graphics window. import pickle
import os
from matplotlib import pyplot as plt
The first word cloud, in Figure 8, is much too simple, of from matplotlib import rcParams
course. I’ll want to add weights, adjust font and canvas rcParams[“figure.figsize”] = (10, 10)
size, and tweak the colors. However, now that I have the
code in a stored procedure, it’s relatively easy to change # connect to database and get plot object
nutritiondb = pyodbc.connect('DRIVER=SQL Server;SERVER=LEGIONJ1060;DATABASE=Nutrition; Trusted_
the parameters and create different graphic objects. I can Connection=True;')
also clone working code to use other regional data sets and cursor = nutritiondb.cursor()
add metadata to the plots table. In short, the use of stored # From a stored procedure, use the following line
procedures makes it easy to build on existing code, pass # cursor.execute(“EXECUTE [dbo].[mysp_Westernwordcloud]”)
in parameters, and generate metadata for storing with the # From a query on table with saved plots, use this line
cursor.execute(“SELECT * FROM [dbo].[pythonplots]”)
plot objects.
# Display plot using matplotlib
tables = cursor.fetchall()
Advanced ML Recipes for i in range(0, len(tables)):
for the Enterprise type(i)
WC = pickle.loads(tables[i][0])
I’ve illustrated the integration of Python and R with SQL type(WC)
Server using a food dataset because it was an interesting plt.imshow(WC)
domain. However, astute readers will have noted that the plt.imshow(WC)
plt.axis("off")
dataset was extremely small and didn’t really require the plt.show()
resources of the server. SQL Server Machine Learning can nutritiondb.close()
certainly be used for this type of exploration, but it’s really

codemag.com Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning 41
required in scenarios where performance and handling of • Formally defining the business problem and the data
large data is required, such as these: required
• Defining the scope and lifecycle of the data science
• Creating models that can be processed in parallel us- project. Describing the people who are required in a
ing revoscalepy, RevoScaleR, or microsoftml algo- large data science project and their roles
rithms • Providing to partners the detailed requirements in
• Saving models to a table, which you can then reuse in terms of packages, server resources, data types, data
any server that supports native scoring SLAs, etc.
• Loading pretrained models from disk and caching for • Specifying ownership and SLAs for related operations
subsequent predictions such as data cleansing, scoring pipelines, backups, etc.
• Scaling predictions from a saved model to multiple
servers In case you’re thinking “this is all too complex for my little
• Embedding Python or R scripts in ETL tasks, using an project,” consider how many applications have started as
Execute SQL task in Integration Services demo projects but ended up in production and ran for years
with scant documentation. Given that data science projects
typically entail massive amounts of data that change fre-
Data scientists often find quently, with small tweaks to algorithms that few people
understand, best start documenting early!
themselves scrambling to efficiently
Food Is Big Business
productize and hand over the Scaling Up = Changing Your Recipe
A core value proposition for integration of SQL Server with
perfect mix of data and algorithm. an open source language like Python (or R) is to increase
Enrollment in food assistance
the limited processing capacity of Python and R to consume
programs grew from nine
percent of the U.S. population more data, and to build models on wider data sets. Revolu-
in 2007, to about 15% (47.6 Go Pro in the Kitchen with Team Data Science tion Analytics did the pioneering work on scaling R to meet
million of Americans) in 2013. Going pro with data science is more complicated than get- enterprise needs, and their acquisition by Microsoft led
ting bigger data or moving the data to a server from a file to the incorporation of R (and later Python) into the SQL
After the recession, share. Scaling up requires fundamental changes in the way Server platform.
participation has gradually you work.
declined, to 43.4 million Other solutions exist to support scaling ML, of course: dis-
people as of July 2016 and Back to the cooking metaphor, imagine a pastry chef who tributed cloud computing, specialized chipsets such as FPGAs,
40.3 million in 2018. has crafted an elaborate French pastry. Like the data scien- use of GPUs, and very large VMs customized specifically to
tist who has painstakingly selected and prepared data and support data science. However, the SQL Server platform has
fine-tuned the results using feature selection and param- the advantage of being both ubiquitous and accessible by
eters, the result is a one-off masterpiece. most developers, and it offers a well-tested security model.

Now imagine that chef being asked to turn that delight- Here are some challenges of scaling data science and solu-
ful recipe into a commodity at the pace of several hundred tions in the SQL Server Machine Learning platform:
thousand per day. The problem is no longer one of taste
and invention, but of scale and process. And because a lot Scaling up is rarely a linear process. This applies to cook-
of money rests on the results, consistency and guaranteed ing, as well as ML. A recipe is not a formula, and a model
results are critical, as well as accountability for preparation that runs on your laptop will not magically scale to millions
and cooking time and ingredient cost. of rows on the server. The training time could be quadratic
to the number of points, depending on the type of model
Data scientists often find themselves scrambling to effi- and data. The problem is not just the size of the data, or
ciently productize and hand over the perfect mix of data even the number of input features that can blow you out
and algorithm. Tasks include documenting what was done of the water. Even algorithms widely known for speed and
and why, ensuring that results are repeatable, changing the tractability with large datasets can include features that
recipe as needed to support scale and cost reduction, and greatly increase the number of computations and thus the
tracking the consistency and quality of results. time your model churns in memory.

The good news is that there’s help from the Team Data Sci- There are different ways to address computation complexity.
ence Project (TDSP). TDSP is a solution created by Microsoft Is the model big because it uses a lot of data, or because
data science teams to guide a team through development, it’s complex, with many columns and different types of fea-
iteration, and production management of a large data sci- tures? In the first case, SQL Server Machine Learning might
ence project. You can read more here: https://docs.micro- be the solution because it supports algorithm optimizations
soft.com/azure/machine-learning/team-data-science-pro- and parallel process that allow distributed processing. In
cess/overview. the second case, SQL Server and Machine Learning Server
offer ways to chunk and stream data, to process far more
data than is possible with native R or Python. You might also
Based loosely on the CRISP-DM standard, TDSP provides a collaborate with your DBA and ETL folks to ensure that the
set of templates for reproducible, well-managed data min- data is available for model training and that the workload
ing projects. The templates apply to multiple products, not can be scheduled.
solely SQL Server, and provide a structure around key data
science tasks that you can use to organize a project or com- Refactoring processes takes time but saves time in the
municate with a client, such as: long run. Functions such as data cleansing or feature engi-

42 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning codemag.com
neering that were run in-line as part of the model building lm(y ~ x1 + x2)
process now might need to be offloaded to an ETL process.
Reporting and exploration are other areas where the typical If you like a challenge, you could always implement it en-
workflow of the data scientist might need drastic change. tirely in T-SQL. But using R or Python sure is a shortcut! To
For example, rather than display a chart in your Python cli- see other examples of what you can do with a few lines of R,
ent, push results to a table so that it can be presented in look up “R one-liners.”
interactive reports or consuming applications.
For some additional ideas of how a DBA might have fun with
Scoring (prediction) is a business priority. Scoring is the R, I recommend this book by a long-time SQL MVP and a
process of doing work with a model, of serving up predic- Microsoft PM: SQL Server 2017 Machine Learning with R:
tions to users. Optimizing this process can either be a “nice Data Exploration, Modeling and Advanced Analytics by
to have” or a showstopper for your ML project. For example, Tomaž Kaštrun and Julie Koesmarno. The book is a depar-
real-time constraints on recommendation systems mean ture from the usual “data science”-centered discussions of
that you must provide a user with a list of “more items like Python and R and is written with the database pro in mind.
this” within a second or they leave the page. Retail stores It includes multiple scenarios where R is applied to typical
must obtain a credit score for a new customer instantly or DBA tasks.
risk losing a customer.

For this reason, SQL Server Machine Learning has placed great Conclusion
emphasis on scoring optimization. Models might be retrained My goal was to demonstrate that running Python (or R) in
infrequently but called often, using millions of rows of input SQL Server is a fun, flexible, and extensible way to do ma- Parameter Help
data. Several features are provided to optimize scoring: chine learning. Moving from the kitchen to the factory is
If you find any part of the
a true paradigm shift that requires coordination as well as
parameters mysterious, I
• Parallel processing from SQL Server innovation, and flexibility in the choice of tools and pro- highly recommend the series
• Native scoring, in which the model “rules” are writ- cesses. Here’s how the new process-oriented, scalable, com- by SQL Server MVP Niels
ten to C, so that predictions can be generated without mercial data science kitchen works: Berglund on the mechanics of
ever loading R or Python. Native scoring from a stored sp_execute_external_script:
model is extremely fast and generally can make maxi- • Your data scientist contributes code that has been de- https://nielsberglund.com/
mum use of SQL Server parallelism. Native scoring can veloped and tested in Python, then optimized by the 2018/03/07/microsoft-sql-
also run in Azure SQLDB. (There are some restrictions new options in revoscalepy. server-r-services---sp_
on the type of models that support native scoring.) • Your DBA brings to the table the ability to keep data execute_external_script---i/
• Distributed scoring, in which a model is saved to a optimized through the model training and scoring
table in another server that can run predictions with processes and guarantees security of your product.
or without R/Python) • Your data architect is busy cooking up new ideas for
using R and Python in the ETL and reporting.
Data Science for the DBA
Want to know the secret cook in the data science kitchen? Stone soup? Sure, the combination of ingredients—SQL
It’s your DBA. They don’t just slave away on your behalf, op- Server plus an open source language—might seem like an
timizing queries and preventing resource contention. They odd one, but in fact they complement each other well, and
also do their own analyses, proactively finding and fending the results improve with the contribution of each tool, cook,
off problems and even intrusions. or bit of data.

To do all this, developers often have code or tools of their  Jeannine Takaki-Nelson
own running on a local SQL Server instance. However, rather 
than install some other tool on the server and pass data
through the SQL Server security boundary, doesn’t it make
more sense to use the Python or R capabilities provided by
Machine Learning Services? Simply by enabling the external
script execution feature, the DBA gains the ability to push
event logs to Python inside the SQL Server security bound-
ary and do some cool analytics. Like, for example:

• Looking for a more sophisticated method of analyzing


log data? Try clustering or sequence analysis in R or
Python.
• Tired of “reinventing the wheel” in T-SQL? Put R in SQL
functions to perform complex statistics on all your
data.
• Need to detect patterns of intrusion or patterns of
user behavior from event logs? Check out the process
mining packages in R or Python.

For example, a fun use for Python/R in the database engine


is to embed specialized statistical computations functions
inside stored procedures. This bit of R code fits a multivari-
ate linear regression model:

codemag.com Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning 43
ONLINE QUICK ID 1911081

Emotional Code
I’ve been paid to program since 1979 and for most of that time, I’ve been working with other people’s code. At first it was “add
this little feature to something we already have.” These days, it’s “how can we be better” and “is this code worth keeping?” Reading
code has always been a huge part of my job, and so I care a lot about the kind of code I (and the people I work with) write.

Of course, I want it to be fast – I’m a C++ programmer after lot of people really feel this way and try to suppress emo-
all. It also needs to be correct, yes. But there’s more to it tions in themselves and in others.
than those two things: I want code that’s readable, under-
standable, sensible, and even pleasant. To a certain extent, they have a point. If we’re arguing about
how many parameters a function takes, I don’t want to hear
I’ve put a lot of work into looking at code and seeing how it that you feel five is the right number: “It just makes me happy
could be better. Often, I recommend making it better by us- seeing it like that.” I want you to persuade me with logic.
ing things—language keywords, library functionality, etc.— But for many cases, the quick overview conclusion delivered
that we’ve added to C++ in this century or even this decade. by emotions is super useful—not for winning arguments, but
Kate Gregory I show people how to write code that does the same thing for doing my work. I look over an API, a whole collection of
kate@gregcons.com but is clearer, shorter, more transparent, or better encap- functions and all their parameters, and something in me says
www.gregcons.com/kateblog sulated. Until recently, I didn’t spend a lot of time thinking EEEWWWW. I don’t know exactly what is so yucky, but it draws
@gregcons about why people wrote that code the way they did, even me in there to give a closer look and to see what I rationally
when there were things they could have done at the time think about that part of the code. I’m not going to say, “pay
Kate Gregory has been using
C++ since before Microsoft that were clearly better. I just made the code better. In this me to re-architect your whole system because it feels kind of
had a C++ compiler, and has article, I want to talk about that why factor, and about the gross and wrong,” but that first emotional response had great
been paid to program since humans who write the code you read and maintain. value in bringing my attention to a place that needed it. I’ve
1979. She loves C++ and be- learned to value those signals a great deal.
lieves that software should
make our lives easier. That
What Are Emotions? But not everyone does. When I tell them “I don’t know
includes making the lives Because I mention Emotions in the title, it’s probably a what specifically I dislike about this API right now; at first
of developers easier! She’ll good idea to discuss what they are. I think of emotions as glance, my gut has a problem with it and I know it needs to
stay up late arguing about out-of-band interrupts. Emotions deliver a conclusion with- be looked at so I’ll write you a summary this afternoon,”
deterministic destruction or out all the supporting evidence being clearly listed. For in- they may reject that because there’s nothing to argue with,
how modern C++ is not the stance, you’re walking down the street, or negotiating to they have to just trust me. They may reject it because they
C++ you remember. buy a car, or on a date, and suddenly your brain tells you think I’m being weird or emotional instead of using my
something like: experience. (This is odd, because emotional and intuitive
Kate runs a small consult- reactions to situations are how your experience generally
ing firm in rural Ontario shows itself to you.) Perhaps they may just be in the habit
• Get out!
and provides mentoring and
• Trust this person. of telling other people not to feel emotions: not to get re-
management consultant
• Smile, relax, everything’s fine. ally happy about things, or really upset either. We have a
services, as well as writing
code every week. She has • Fight, yell, hit, and scream! whole strain of humor about this: “Oh, the humanity” and
spoken all over the world, other memes where people are mocked for being upset over
written over a dozen books, These reactions may be right—you may be talking to a won- relatively small things or for being happy.
and helped thousands of de- derful person you can trust—or wrong—the sales rep may
velopers to be better at what be trying to flatter you into paying too much for the car.
they do. Kate is a Visual C++ But the point is that at the moment the message arrives,
MVP, an Imagine Cup judge you don’t have a nice clear list of reasons to feel that way. Emotional and intuitive reactions to
and mentor, and an active Your brain has done some pattern matching and delivered a
contributor to StackOverflow conclusion. You can act on it or not. situations are how your experience
and other StackExchange generally shows itself to you.
sites. She develops courses Some people don’t like it when other people take actions
for Pluralsight, primarily on based on emotions. If you become angry and leave a situa-
C++ and Visual Studio. Since tion, you may not be able to explain the precise details that
its founding in 2014, she led to your conclusion. You may not be able to prove that Mocking people for their “first-world problems” or replying
has served on the Planning leaving was right. I’ve been told that relying on emotional “oh, the humanity” to them when they’re getting worked up
and Program committees reactions to make decisions is lazy, non-rigorous, or cutting is our way of enforcing a social norm within the programming
for CppCon, the largest C++ corners. community, especially programming communities that have
conference ever held, where roots in the 20th century rather than the 21st. The social norm
she also delivers sessions.
says “don’t express emotions to me,” and ideally, don’t even
No Emotions Allowed have emotions. But here’s the thing: Programmers are human
People who feel that that emotional reactions are inappro- beings and human beings have emotions. Therefore, whether
priate are especially common in the field of software de- you like it or not, programmers have emotions.
velopment. They ban the disruptive out-of-band interrupts
that emotions are. They insist that you must win arguments In reality, emotions are a big part of software development.
with logic and not with feeling strongly about things. Just It’s a lot more than writing and debugging code, and the not-
the pure crystalline logic of the 1s and 0s of the matrix. A code parts of it are FULL of emotion: getting users to tell you

44 Emotional Code codemag.com


what they want instead of what they think they want, trust- called. Member variables that are never set or used. “How can I
ing your team, telling the truth about your limitations and be sure we won’t need it?” the developer is clearly saying. Oth-
dreams, being brave enough to go against the stream when er times, they know they don’t need it, but they still don’t take
you have to, keeping your integrity and values when you find the time to clean up. They don’t have time for that. They’re
yourself in a place that doesn’t share them, and much more. going to “get in trouble” for not getting things done quickly
enough, for not getting enough things done each day, and de-
I’m not here to talk about any of that today. I want to focus leting code that “isn’t hurting anyone” just doesn’t make it
in on just one myth, just one not-true thing that we all tell onto the priority list. Yes, perhaps some future developer will
ourselves: There are no emotions in code. That there are no be slowed a little by trying to figure out what’s happening, but
messy feelings when it comes to writing code, because code this developer right here is trying to make a deadline or trying
is purely logical. not to get fired, and so the old unneeded junk stays behind.

It takes bravery to divert from patterns you see in existing


Your Code Shows Emotions code, even when you know they’re wrong, to stand up for
In fact, code is full of emotions, right there on the page for the right way to do things. I see things like this next sample
anyone to read. Most people don’t believe me, but seeing is in almost every C++ codebase I meet:
believing. Let me show you some examples.
int c, n;
Fear int r1, r2, r3, r4;
When you start asking why people do things, you can get double factor;
some interesting answers. Take commented-out code as an double pct1, pct2, pct3, v1, v2, v3, v4, v5;
example. You can’t find anyone who likes it. We rant and double d1, d2, d3;
rail that there is source control; people can take work notes
and paste the deleted stuff in there to use later; it’s just There’s so much wrong here! None of these variables are
confusing when you’re trying to read later, it messes up your being initialized. None are declared close to where they’re
searches; and so on. But people do it, they do it every day. used. And the names! I can guess what factor is, but what
Why? Because they’re scared. “I might not be doing this does r stand for? I am pretty sure v just stands for value and
right; I might need this later.” d for double. There’s no information at all in these names.
But a scared developer, a developer worried about what
Here’s an example from real code, found in the middle of a code reviewers will say, a developer afraid they’re putting
long function. bugs in working code by messing with it, that developer just
adds d4 to the end of the list and feels pretty sure that’s the
//if (m_nCurrentX != g_nCurrentX safest thing to do today.
// || m_nCurrentABC != g_nCurrentABC) {
//} I also see code that checks conditions that don’t need to be
checked. Here’s a C++ example:
That’s a semi-tricky “if” comparing the member variable
values of things to some global values with similar names. if (pPolicy)
That’s apparently no longer necessary, but the developer {
can’t bring themself to throw it away. They aren’t sure they delete pPolicy;
could get it back if they needed it again. They feel timid and }
afraid about this code base, and about the consequences for
not being able to find what they need later. Deleting a null pointer is a no-op. It’s harmless. There is no
need for this check but I see it all the time. (If the pointer
Another frustrating category of comments: the “don’t was set to null after the delete, or anything else was done
blame me” comment. This developer is afraid they’ll be inside those braces, then it would be fine, but that’s not
caught doing something that another developer, or a man- happening here.) I’m also likely to wonder why you’re do-
ager, will disapprove of, so they leave a comment explaining ing manual memory management here but ignore that for
that it wasn’t their choice, someone told them to make this now and concentrate on the mindset of someone who keeps
change. You’ll see things like: checking things “just in case,” someone who’s paying in
runtime performance every single time this application
limit = InputLimit - 1; //as per Bill runs, because they feel tentative and unsure. Afraid.

This developer hasn’t decided how to calculate the limit, It’s easy to say that training people and doing code reviews
isn’t prepared to stand behind that calculation, is telling will teach them that delete on a null pointer is a no-op. But
you “hey, if you have any issues with this line of code, don’t what if the developer is afraid their coworkers will let them
talk to me, go and find Bill, it was all Bill’s idea.” And it’s not down? They check the same conditions repeatedly because
just that they aren’t confident how to calculate the limit, it’s they can’t be sure the conditions are always met. “That was
also that they’re worried someone is going to object to this in a team-mate’s code, they might have changed it without
line and they want to defend themselves against hostile re- telling me.” Here’s a place where a thorough test suite, and
views. Somehow the environment has left people unable to running those tests after every change, can improve your
feel confident about even the simplest calculations. runtime performance. If you can confidently write the code
knowing you’re getting valid parameters from the other
Fear is also why people don’t delete variables they’re not us- parts of the application, think how many of those runtime
ing. They leave in lines of code calculating something that’s checks (make sure x is positive, check that y is not over the
never used after it’s calculated. Functions that are never limit, and so on) could be dropped!

codemag.com Emotional Code 45


Sometimes fear is also why a developer does everything by think they’re wrong. I came across something called Und-
hand instead of using a library. They got hurt by a library oStevesNonsense() once. Only it wasn’t Steve and it wasn’t
once and now they need to see it, step through it, watch it nonsense; it was a cruder expression of disagreement. Ap-
work. They can’t trust anyone else’s code and they’re willing parently, Steve wrote some code that did things this devel-
to take longer writing it all themselves, or write naïve code oper didn’t think should be done, so rather than settling
that misses some edge and corner cases, out of the fear of the design as a group or going to an authority, this person
what unknown library code might do. just undid those steps and named the function accordingly.
Again, if you’re all about runtime performance, the thought
Arrogance and Anger of one developer doing steps A, B, and C (that all take time
Earlier, I showed a code snippet with variable names like r1 on every run) and another developer turning around and
and v2. I’ve actually asked a developer what those names arranging for them to be undone, it’s horrifying. But it’s
meant and been asked “aren’t you smart enough to figure real in live code today.
out what these are?” When I ask people to explain obscure
function names, the responses sound an awful lot like “Why Selfishness
should I explain myself to people who can’t understand it Fear, arrogance, and anger don’t explain all the bad code
without an explanation?” You’ll see this with deliberately out there. Another huge group of developers are selfish.
opaque names like f(), foo, and bar. Setting aside their ori- They don’t take the time to clean up, refactor, rearrange, or
gins in the dark humor of people facing wartime death, foo rename as they go. You can hear them asking “Why should I
and bar mean “this doesn’t have a name, it doesn’t need a spend my time making things easy for you?” Imagine you’re
name, and your desire for it to have a name is wrong.” I think given an hour to fix something. You get it fixed in 50 min-
that’s a terrible thing to leave in code for others to read later. utes. If you spend 20 minutes tidying up, subsequent fixes
in this area will only take half an hour. You’ll be even after
It may come from arrogance, from believing you’re just bet- only one fix in this area, and ahead for every fix after that.
ter than everyone else and they don’t deserve explanations, But if the team is measured on each fix, it’s possible some-
or not wanting to take the time to provide them. It can also one else will be saving that half hour, while the original
come from someone who is angry about something else and developer is punished for the 20 minutes extra. So, unsur-
showing that in their code. prisingly, that 20 minutes isn’t taken.

Sometimes this sort of “I’m better than anyone else” is Selfish code has short and opaque names because the de-
what drives developers to use raw loops instead of some- veloper didn’t bother thinking of good ones. It uses magic
thing from a library, to write their own containers, and so numbers instead of constants with names. There are side ef-
on. They perhaps ran into a performance problem a decade fects and consequences everywhere, like using public vari-
or two ago in one popular library and concluded that they ables because it’s quicker, or mutable global state because
would always be better than those library writers. Although it’s quicker. Sure, it might be slower next time, but next
“it ain’t bragging if you can do it,” very few developers can time is someone’s else problem, right?
actually outperform those who concentrate on a specific li-
brary all day long. Maybe they measured the performance of Selfishness also leads to information hoarding. My job is
their hand-rolled solution with their unusual data against safe if nobody else can do this. If I explain this to others, I’ll
the library solution, but then again, maybe they didn’t. It’s be less valuable because I’ll be replaceable. (As an outside
worth checking. consultant, my usual reaction on hearing that someone is
irreplaceable is to change that first. It’s not good for teams
and it doesn’t lead to good code in most cases.) This devel-
oper sees their coworkers as competitors and doesn’t want
Malware has more swear words to help them. That’s not how good teams work.
in the code than non-malware.
Laziness
Not all programmers are selfish, of course. Some of them
just can’t be bothered. “Whatever; it works. Mostly. We
You’d be surprised (I no longer am) how often I find sneer- have testers, right?” They don’t use libraries because they
ing comments and names in live code. People who say lus- resist learning new things or finding new ways to do things.
ers, PEBCAK, and RTFM in emails and Slack say it in their They’re busy typing code. Or copying and pasting code.
code too. They say it in their commit messages too: April “Abstraction? Sounds like work to me!” When you suggest
Wensel found a pile of hits for “stupid user” in GitHub com- that they add some testing, or build automation or scripts,
mit comments. Obviously, those comments are public (she you’re likely to hear, “If you think that matters, you do it.”
found them). How do you imagine users would feel discover- They don’t show any kind of commitment to the quality of
ing a commit message that was nothing more than “Be nicer the code, the user’s experience, deadlines, their own future,
to the stupid user” when learning about a product they the success of the company, or the goals of the team. They
used? And that committer needs to try harder at “be nicer,” just want to come in, type without thinking a lot, and go
by the way, because that comment shows a distinct lack of home, having been paid for the day regardless of what was
nice. Steve Miller searches executables for swear words: actually accomplished. And it shows in the code!
Malware has more of them than non-malware does.
It’s not just that they haven’t refactored, haven’t spotted
I’ve also seen function names and variable names that drip useful abstractions, haven’t given things good names. You
with disdain and contempt for the work being done. No, see repetition, really long functions, a mishmash of naming
I don’t mean calling a variable “dummy.” I mean things conventions—all things that are easy to clean up on a slow
like putting a coworker’s name in a function to show you day when you don’t want to be thinking about new code. But

46 Emotional Code codemag.com


that time just isn’t taken. You see bugs that would be easily miss a deadline, or don’t close enough tickets each day, or get
caught if you turned up the warning level, because turning changes requested after a code review, then naturally the way
down the warning level is a classic lazy person’s way of get- to help them write better code is to reassure them about those
ting their code to compile and run. And you see disorganiza- fears. Yelling at them to get their code quality up is only going
tion and mess because the effort to think clearly about the to increase their fear and will probably make the code worse.
problem and organize the code to communicate about the But if someone is writing bad code because they’re selfish or
problem is more effort than this developer is willing to make. arrogant and not interested in making things easier for the
rest of the team, they don’t need reassurance, they need re-
Now, a warning. Some teams practice crunch. If you keep minders about what’s important to their employer.
a team in crunch indefinitely, they will end up behaving in
a way that is indistinguishable from laziness. They literally
don’t have time to find out how to do things more quickly.
They live in fear of the drop-dead deadline, that if it’s missed, Are your management
they will be fired, the company will collapse, they will for-
ever be associated with the failed project. They can’t invest
practices causing runtime
an hour to save a day or a week. Nobody is letting them have performance issues?
the hour. They already aren’t sleeping enough and are keep-
ing track of whose marriage has failed so far on a whiteboard
in the common room. The code that comes out of crunch is
rarely good code. Sometimes, that doesn’t matter. When it I also keep the emotions my code demonstrates in mind
does matter, good teams go back and clean up afterwards, when I’m writing—even one-page samples that can fit on a
when the deadline has been met. If no one has done that and slide or in an article. People will copy what they see in your
the artifacts are still in the code, I know there’s a lazy devel- code base as they work on it. Do you want them to copy
oper that’s getting away with it, or a burned-out developer the fear or arrogance or selfishness they see there? Do you
that no one looks after, or perhaps perennial crunch. The bad want them to believe that good developers use short and
code points directly to bad management practices. meaningless names?

Why Does This Matter?


So yes, your code can show emotions to me. I can see fear, Look Where You Want to Go
arrogance, selfishness, laziness, and more, right there on It’s tempting to conclude that you should try to take the
the screen. In some sense, it’s true that code is only logic emotions out of your code. But I don’t think it’s possible.
and has no emotions. You’re given a rule like “if today’s The bad code showing that the author was afraid is timid,
date is after the due date, then the item is overdue and overly cautious, self-protective. Take that fear away and you
this is how it gets processed.” There’s no emotion to that. don’t have neutral code, you have confident and capable
The person doesn’t get a break on the overdue process be- code. The selfish code that hoards information, after you’ve
cause they’re cute, or get extra fees charged because they’re done some refactoring so it explains itself and has clear
rude. But everything in the way you implement that rule, names, isn’t neutral: it’s generous. Once you understand
including the variable and function names you use, whether that everything you write is lighting up with good or bad
you make checking for overdue-ness a member function of emotions, why not, when you can, take the time and put in
something, what data type you use for dates and how you the work to show what you stand for and create code that
test “after,” all of that can and does carry emotion to some- inspires and helps others?
one who knows how to read it there.
Confidence
Of course, one single-letter variable name does not a psy- You can show confidence in your code. Start by deleting
chopath make. Sometimes calling a variable i is the right things you don’t need—old code, variables whose values
thing to do. I see emotions in patterns, not in single in- aren’t used, member variables that are never set or read.
stances. And when you see a pattern, and learn about the You have source control, after all. I take “work notes” as
team and the developers, you don’t go and find the code I tackle specific tasks, and if I rip out several lines of code
author and say “wow! I never knew before how scared you that are now obsolete, I can paste them into those notes,
are all the time!” Seeing the emotional causes of bad code where they’ll generally be easier to find than if they stay in
can give you empathy as you read and fix legacy code. I no the code, but commented out. As a nice side effect, my code
longer yell out “what were you thinking?” as I read old code. now appears less tentative, less worried about the future,
I understand now that sometimes people were under time less concerned with what a reviewer will think of me.
pressure, were being measured, were getting unpleasant
code reviews, or were missing the tools we all rely on, and When you’ve just added a feature or fixed a bug or otherwise
that put them in a mindset that produced this sort of code. been working on some code, take the time to clean up after-
wards. You’ll never know more about this code than you do
For me, gaining these insights also leads naturally to suggest- right now, and now is the time to record that knowledge in
ing changes to a team or a workplace. That might mean that the code itself. Give things names that explain your think-
a particular team member should learn something, or that a ing. Leave comments that guide people through the sharp
particular management practice should be amended. Ask your- edges if there are some. This might help you later or it might
self, are your management practices causing runtime perfor- help someone else. Code like this says, “I know I’m right, let
mance issues? They often do and knowing that may be enough me show you.”
to get them changed. It can also direct the way you try to get
a particular developer to write better code. If they’re writing When you come across something obsolete, that can be done
bad code because they’re afraid that they’ll be fired if they better using a new language feature, change it. (You have

codemag.com Emotional Code 47


tests, right?) When you come across something hand-rolled, statements instead of one. Someone might change the 50
replace it with a known-good library approach. (Again, you in the first case but forget to change it in the second. And
have tests. You can do this.) When you find a raft of variable so on. Most of all, it takes you some time to reason through
names like r1, r2, r3, change them to something better. this and figure out what the rule is as implemented here.
Let your code say, “I’m brave enough to stand up for doing Compare to this line:
things the right way.”
return (application.CreditScore > 50);
Humility
The opposite of arrogance is humility. Knowing that you’re It does exactly the same thing. It’s much shorter, and it
good is not the same as thinking you’re the best at every- can’t get inconsistent. There’s only one 50 in there—if you
thing. Code acknowledging that the next person to read is change it, you’ve changed it in all the places you need to.
likely to be a good developer who deserves an explanation You don’t leave the reader looking at your default case
is humble code. So use libraries, and include a link to the trying to imagine an integer that doesn’t meet either of
documentation somewhere. In C++, it’s easy to add a com- the first two cases. And there’s no local variable to trace
ment on the line where you #include the header file. In through the code: That’s one less moving part for the reader
other languages, you can find somewhere else to put that to hold in their head.
comment so that people copying and pasting later will paste
the link as well as the code that uses the library. Something that’s rare for me when I start to look through a
team’s existing code is that it simply compiles, links, runs,
Write gentle comments that tell why, not what. But don’t and passes the tests. Oh, it compiles, but there are warn-
SPONSORED SIDEBAR: rely only on comments. Where things aren’t obvious, you ings. Maybe hundreds of them. And the team members all
want to leave some help for the next person, and names are know there are 417 and if ever there are 418, somebody
Are Your Apps Stuck always better than comments for functions, variables—ev- needs to look into it. Or it runs, but you get an exception
in the Past? erything. When you imagine the next reader of your code, on startup, just hit Continue and don’t worry about it. Or
don’t imagine someone less than you; imagine someone it leaves a few stray files behind and you have to hand de-
Need free advice on better than you. After all future you is likely to be better lete them before you run it again. Or it passes the tests,
migrating an existing than current you, and future you is the most likely next but there are only seven of them. When I meet code that
VB6, FoxPro, WinForms,
reader of this code. compiles without warnings, runs smoothly without by-hand
Access, ASP Classic, or C++
steps, and has complete and well documented tests, I feel
application to a modern
platform? CODE Consulting
Generosity and Hard Work really looked after. Here’s someone who doesn’t have to be
has years of experience My eyes light up when I read code that’s truly well done. asked to do it right. They aren’t using tools for the sake of
migrating applications and When I see clean engineering that was done to make next tools, or for fun, but to make things run smoothly.
has experience in ASP.NET time easier, well thought-out encapsulation, and that elu-
MVC, .NET Core, HTML5, sive appropriate level of abstraction. Someone has taken I can see that sort of work ethic when I look inside the code,
JavaScript/TypeScript, the time to clean up: to refactor, rearrange, and rename too. It uses modern constructs or libraries because the de-
NodeJS, mobile (IOS things so that the code makes sense and leads me through velopers are always learning. It follows modern practices,
and Android), WPF, and the process. not just churning out code. Code like this (and the scripts
more. Contact us today to and tests that surround it) show a commitment to the fu-
schedule your free hour of Information sharing is also generous, someone who’s think- ture, to that developer’s own ease and to the team’s suc-
CODE consulting call with ing “my job is safe if we can all do this.” Their comments cess. This is the code of a hard-working programmer who
our expert consultants are enlightening, they’ve chosen good names, and they’ve doesn’t do just the minimum to skate by. Who doesn’t just
(not a sales call!). arranged the code in an order that makes sense to the read- copy what was there before, including the bad patterns and
For more information, visit er. “First we prep the order, then ship it, then update the the bad code. Who takes the time to see if now is the time
www.codemag.com/ inventory. I get it.” That sort of clarity isn’t easy, and it’s to change that thing that sort of grew organically and has
consulting or email us at generally not what you write the first time. Someone has put become unwieldly and almost unmaintainable.
info@codemag.com. in the effort to make this code good.

Sometimes code just strikes you as brilliant. It’s clearly and Choose to Show Positive Emotions
obviously right, and dramatically easy to grasp. I don’t mean So sure, your code could show fear, selfishness, laziness,
that it’s clever. I mean that it’s obvious. Consider this C#: and arrogance. But why not show confidence, generosity,
humility, and how hard working you are? Your code will be
var creditScore = application.CreditScore; easier to read and maintain. You’ll enjoy reading and main-
switch (creditScore) taining it more, and your reputation will improve as other
{ people realize they can understand what you write, it’s easy
case int n when (n <= 50): to change it when life changes, and it’s generally better
return false; code. Even if the code isn’t better, there’s a lot to be gained
case int n when (n > 50): from writing this way. But it probably will be better.
return true;
default: I want you to care about those who wrote the code you main-
return false; tain and those who maintain the code you write. When you
} find crummy code, fix it. Show your confidence. Clean up.
Make it right. Name things well. You’re going to show emo-
This code isn’t wrong. It compiles without any warnings, and tions in your code and they might as well be positive ones!
it implements the logic that’s needed. But there are three
cases (two ranges and a default) and lots of places to make  Kate Gregory
mistakes. Someone might return true in two of these return 

48 Emotional Code codemag.com


ONLINE QUICK ID 1911091

POURing Over Your Website:


An Introduction to Digital Accessibility
In 2010 when I was diagnosed with fibromyalgia, I did what any good nerd would: I started searching for more information
online. While searching, I came across many disabled people and disability communities discussing the issues they had with
accessibility—both physical and digital. This opened up a whole new world for me: Accessibility was never taught in college

or mentioned in any programming job I’d ever had. I, like Perceivable


many people, assumed that the Web was for everyone, and Let’s start with perceivable, the first step for accessibility and
it would just magically work. My eyes were opened, and I the foundation for the other three elements. After all, how can
found it terribly unfair that people were struggling so much you operate or understand something if you can’t perceive it?
to access the greatest resource of human information ever.
Wasn’t the Internet supposed to be the great equalizer? All input to our brains comes via one or more senses, with
seeing, hearing, and touching being the primary ways for
humans to take in and convey information. A person who’s
Accessibility Standards blind or visually impaired needs to use another sense, such
Ashleigh Lodge Let’s start with an introduction to some of the standards as hearing, to access information that sighted people get
ashleigh.lodge@gmail.com that govern Web accessibility—there are several. You’ve visually. You may have heard of screen readers, which are
@shimmoril likely heard of the W3C, or the World Wide Web Consortium tools that literally read out the underlying code of a site so
(https://www.w3.org/), the standards body for the Web. The a blind or low vision user can access it.
Ashleigh is the Application
WAI is a part of the W3C that specifically deals with accessibil-
Development Manager at
ity, as the Web Accessibility Initiative (https://www.w3.org/ Screen Readers
Neovation Learning Solu-
WAI/). There are two primary standards developed by the The most common categories for disabilities are pain, flex-
tions (www.neovation.com/)
in Winnipeg, Manitoba, WAI: WCAG 2.1 and ATAG 2.0. ibility, and mobility, followed by mental health, dexterity,
Canada. and hearing, and then seeing, learning, and memory. Al-
The Web Content Accessibility Guidelines (WCAG) (https:// though screen readers are essential tools, focusing only
Ashleigh is a vocal advo- www.w3.org/TR/WCAG21/) are currently at version 2.1, re- on screen reader users means that you’ve ignored six other
cate for accessibility and leased in June 2018. WCAG deals with accessing content— more common disability categories completely. And of
inclusive design. Earlier this think about reading a newspaper or magazine article online course, disabilities have multiple aspects—over 75% of dis-
year, Ashleigh was a speaker or watching videos on YouTube. abled people reported in more than one category.
at TedxWinnipeg, bringing
the idea and foundations The Authoring Tool Accessibility Guidelines (ATAG) (https:// If you’re curious about how screen readers work, I can al-
of digital accessibility to www.w3.org/TR/ATAG20/) are a newer standard; v2 is from most guarantee that you have one in your hand or pocket
an audience of nearly 700 2015. ATAG applies when tools are provided to create content, right now. All Apple and Android phones and tablets come
Winnipeggers. i.e., writing a post on Medium or creating an Instagram story. with one built in, and they’re pretty easy to pick up and
start using. Take a look at the settings and enable VoiceOver
In her free time, Ashleigh
You’ll notice that many familiar services are bound by both (Apple) or TalkBack (Android) and give it a try for yourself.
consumes a truly frighten-
ing amount of pop culture standards. If you provide a product that allows users to cre- What’s the same and what’s different about accessing sites
media, including movies, ate and post their own content, you’ll need to consider ATAG and apps this way? Can you access them at all?
TV shows, comic books, and for content creation and WCAG for content display. I’ll focus
novels. You can usually find on WCAG in this article, because there are far more people Semantic HTML
her with Pokémon Go open viewing and reading content than there are creating it, even For Web accessibility, properly semantic HTML is key for
on her phone, no matter when you consider social media. many reasons; knowing how screen readers work can help
where she is or what she’s you understand why. These tools present much more of the
(supposed to be) doing. underlying structure of a site than a sighted user may ex-
WCAG Overview pect. For example:
WCAG is broken down into four guiding principles: Perceiv-
able, Operable, Understandable, and Robust, often abbrevi- • A screen reader describes a link with the text of “All” as
ated as “POUR.” “All, link” or “All, visited link” depending on the state.
• A screen reader announces specific elements such as but-
These principles are key to understanding digital accessi- tons and headers along with their level (i.e., h1, h2, etc.).
bility: If you remember nothing else, try to keep POUR in • For images, screen readers attempt to read out a de-
mind. There are three WCAG ratings, single A (A), double scription of the image (alt text), if one’s provided. If
A (AA), and triple A (AAA). In general, you’ll be aiming for not, the file name of the image is used instead, even if
AA, except in cases where AAA is easily attainable with a it’s obfuscated or otherwise confusing and unhelpful.
little extra effort. If you’re new to accessibility, A may seem • Most screen readers ignore non-semantic elements,
like a good place to start, but be aware that level A criteria such as divs and spans.
are seen as the barest of the bare minimum; not even a real
improvement over forgetting or disregarding accessibility The div example brings up a common scenario: If you’ve done
altogether. any Web development, I’ll bet that you’ve seen cases where

50 POURing Over Your Website: An Introduction to Digital Accessibility codemag.com


a div has been styled to look like a button, instead of using Did you know that all the major social media platforms have a
the actual button element. This is problematic accessibility- feature where you can add alt text to your images as you post
wise for a variety of reasons. A screen reader or other assis- them? It’s not enabled by default (which could be considered a
tive device may skip it completely, confusing users. Using the violation of ATAG), but check out the settings on Facebook, Twit-
wrong element may cause assistive technologies to announce ter, and Instagram and start adding alt text to all your images.
it incorrectly, which causes frustration. Or, your button may Unfortunately, you can’t add alt text to a gif on these platforms,
be inaccessible by keyboard—and many disabled people rely but you could include a description as part of your content. Af-
on keyboard-based navigation. Ultimately, you lose all of the ter all, you never know when you’re going to go viral, and every-
built-in benefits available with a semantic button. one really should be able to experience your brilliance and wit.

Semantic elements such as buttons, links, inputs, checkbox-


es, etc. provide information to a browser that gives context Operable
to the element, even when it stands alone. Non-semantic ele- Next up is operable. How do you usually move around online?
ments, such as divs and spans, don’t provide this contextual Probably with a mouse or by tapping on a phone or tablet,
information to the browser, and therefore they do not provide right? If you have to enter information, you use a keyboard.
necessary information to assistive devices or to users.
Some users, such as those with Parkinson’s, may not have
Color the fine motor control needed to use a mouse. In other cas-
Color is a big aspect of perceivable. A large number of peo- es, people with limited mobility may use assistive technol-
ple are color-blind or have low vision, so in order to have ogy that replaces their mouse or keyboard entirely. These
accessible content, you must use colors that provide suf- users can also have difficulty with areas that are too small to
ficient contrast between the background and text (or image) tap accurately, particularly on a phone or tablet.
colors. The WCAG AA standard for color contrast requires a
ratio of 4.5:1; for AAA it is 7:1. Operable generally refers to keyboard accessibility, because
most assistive devices mimic keyboard functionality. This
Luckily, you don’t need to do the math yourself! There are also makes operable one of the easiest principles to test:
a number of tools that let you input the background and I dare you to disconnect your mouse and try to navigate
foreground colors in hex, RBG, and HSLA formats and re- through a site you use every day.
ceive the contrast ratio. Some tools even take font size into
consideration—a larger font can have a lower contrast ratio • Is it always clear where you are on a page (i.e., the
than the general guideline and still be AA or AAA compliant. focus state)?
• Can you interact with all the controls you normally
Using color alone to convey meaning causes accessibility is- would—things like menus, buttons, and drop-downs?
sues as well. Someone who’s red-green color blind may not • Is it possible to bypass irrelevant or repetitive sections
be able to tell the difference between a grade written in easily, such as the header or navigation, via a skip link?
green (i.e., a pass), and one written in red (a failure). Add- • Is the content structured in a consistent and meaning-
ing another indicator, such as a thumbs-up or thumbs-down ful way, with labels always positioned before, and prop-
icon, provides additional context, and makes the site more erly associated with their controls (i.e., can you interact
usable for everyone, not only users with disabilities. Be with a control by clicking or tapping on the label)?
careful with icons though—solely using icons (no text) can • And what about hotkeys? You probably already know
cause issues for people with cognitive disabilities or non- about using the backspace key instead of the back
disabled users who have a different cultural background. button, but do you know about the hotkeys specifi-
cally for, say, the Twitter Web client, or do you have to
Transformable figure them out by trial and error?
In order for content to be perceivable, it has to be trans-
formable as well, to support users with different abilities. In Speaking of errors, users will make them, so it’s crucial to
general, you’ll want to provide text alternatives for images, provide ways for users to recover from these errors. For the
video, and audio. most part, this isn’t too difficult: If someone makes a post
by accident, you can allow them to edit it or delete it and
• For images, transformable content is called alt or al- start again. But what if a user transfers $1,000 to the wrong
ternative text, and it’s a description of an image that account? This can be much harder (or impossible) to undo.
conveys the necessary meaning to a screen reader user. Things like confirmation screens, alerts, and warnings ben-
You may also need to consider whether this image is efit all users, not just those with disabilities. Even better,
purely decorative, in which case you provide an empty provide instructions before users start a long or compli-
alt text attribute to hide it from screen readers. cated process; these help users avoid errors from the start.
• With videos, transformable content can mean captions
or described audio, which are distinct and have different These examples show inclusive design principles—rather than
features. Captions transcribe speech; a similar feature you designing a product and only then figuring out how it might
may be familiar with is subtitles on a TV show or movie. work for disabled people, you can intentionally design prod-
Described audio is an audio description of everything ap- ucts that work for all users, including those with disabilities.
pearing on the screen. Think of it as telling a friend about Giving users control over timing and time limits is also cru-
the amazing hotel room you had while on vacation: “When cial to operability—I’m looking at you, Ticketmaster! Con-
I turned the corner, and saw the room for the first time, sider the situation where you’re trying to buy tickets for
it was HUGE! There was a sunken coffee table area and…” your favorite band and there’s a time limit on the process.
• When you have purely audio content (like podcasts), a Now think about the experience of someone with a cogni-
transcript is required. tive disability who has a slower response time. And what

codemag.com POURing Over Your Website: An Introduction to Digital Accessibility 51


if they’re also using an assistive device, which could also There are standards for HTML and CSS and you should be
increase their response time. following them. Not only is this good for accessibility, it also
provides a certain amount of future-proofing. When Apple
This kind of control also applies to animations and videos: first announced that websites would be available on the
An epileptic user needs to be able to stop or pause a rapidly Watch, they noted that any sites using standard HTML would
flashing animation or video before it causes a seizure, and just work. Other sites, well they were in a bit of trouble.
all users benefit by being able to scan through a video to see
what they missed (or skip ahead to the good parts).
Accessibility with ARIA
So how do you make Web content accessible when there’s no
Understandable standard? For example, there’s no semantic HTML element
Once your content is perceivable and operable by a wide variety for tabs, so you must create your own visual representations
of users, you need to ask whether they can understand it. Lan- using lists and divs.
guage choices matter to accessibility: What counts as “under-
standable” depends on your audience, but unless you’re 100% Fortunately, there’s a tool called ARIA—Accessible Rich In-
sure about education, background knowledge, life experience, ternet Applications—that’s meant for cases when you have
and culture, it’s best to default to very simple and concise text. to go above and beyond the existing standards. The first
rule of ARIA is that you don’t use ARIA.
Basically, you don’t want to use a long word when a short one
would work just as well—and probably better. For example, Let me explain: Bad ARIA is actually worse than no ARIA at
instead of saying additional, say added. Instead of contains, all. When you have a non-standard HTML element or have
say has, and rather than expiration, just say ends. used a standard element in a non-standard way, you may
cause confusion or problems for disabled people using as-
Functionality also needs to be understandable. At some point, sistive technologies, but it’s likely that they’re familiar with
developers decided that the hamburger icon (three stacked hor- these types of problems and can work through it. However,
izontal lines) would represent menus, and people have become if you have added ARIA without knowing exactly what you’re
familiar with this icon. So, unless there is a very good reason doing, it’s very likely that you’ve made that element, or pos-
to break from the standard, you should also use the hamburger sibly your whole site, completely unusable.
icon to represent menus on your site. If you do decide to do
something unique, it’s important that you’re at least consistent ARIA should be used in three cases:
within your own site. Don’t use both the hamburger and your
custom icon or you’ll just end up confusing your users. • When you have dynamic content. A common pattern
in Web development is to create a single-page applica-
Forms often present problems with understandability. For tion (SPA), where the page never actually refreshes.
example, when the Save or Next button is disabled, it’s not This is problematic for screen reader users, because
always immediately clear what a user needs to do to enable they have no indication that the page contents have
that button. Rather than force users to poke around the form, changed or have been updated.
filling in values at random until the button becomes active, • When you need to expose your content to either
consider accessibility and inclusive design. Although it’s pos- sighted users or assistive device users, but not both
sible that someone without a disability may figure out how (or both but in different ways). For example, you
to get the form working more quickly than someone with a wouldn’t want your skip link to be visible to all users,
disability, it’s a terrible experience for both users. but it should be focusable by anyone using assistive
technology.
• When you have advanced UI elements. As I men-
Robust tioned above, there’s no standard HTML element for
Last, but absolutely not least, we have robust. Does your site tabs, so you create these controls using lists, divs, and
work in Chrome? Probably. What about Firefox? Yeah, probably a whole bunch of CSS to make them appear visually
that one too. What about IE11 and Edge? Safari, UC Browser, like tabs. But what about screen reader users? You can
Brave? Does your site work on older versions of these browsers? use ARIA to indicate the relationship between a list
What about the developer/canary versions? Does your app work item and a div, as well as the current state of the div
on an iPhone 5? What about an iPhone X, with the notch at the (i.e., is the tab active/visible).
top? What about allllll the different types of Android devices?
ARIA is an extremely powerful tool that allows you to hook
The key for robust is that unless there is a critical compatibil- directly into the accessibility tree of a browser. The accessibil-
ity issue, everything should just work. Users need to be able ity tree is basically the DOM (document object model), but for
to choose their own technologies in order to meet their own accessibility. All styling is removed, any elements that have
unique needs. I can feel you glaring at me through the page: been hidden from assistive devices aren’t included, etc. There
This doesn’t mean that you must support every version of every are three aspects of ARIA that affect how an element is pre-
browser out there, just that you shouldn’t be too restrictive. sented in the accessibility tree: role, relationship, and state.

Having a message on your site that it only works in the lat- Role
est version of Chrome doesn’t do anyone any good. When Semantic and interactive HTML elements have a role based
thinking about inclusive design as well as accessibility, con- on the type of control—button, checkbox, heading, etc.
sider the user who is at work and doesn’t have permission to Non-interactive HTML elements can be given a role via ARIA,
install the latest version of Chrome, as well as the disabled and the default role of a semantic element can be overrid-
user who can’t use Chrome with their assistive technology. den. Consider this snippet of code:

52 POURing Over Your Website: An Introduction to Digital Accessibility codemag.com


<h1 role=”button”> this custom checkbox to make it properly accessible. Or, you
This is a heading - or is it? could just use the standard, semantically correct checkbox
</h1> input element instead and get all the functionality auto-
matically. Up to you!
Here, I’ve used a semantic element, a heading, but I’ve also
provided an ARIA role of type button. Visually, this will be
rendered as a heading. However, as far as the accessibility Next Steps
tree is concerned, this code represents a button. Screen Knowing anything about accessibility—even just a vague
readers and other assistive devices will announce this as a idea of what it means—puts you in a better position than
button, but the user won’t actually be able to interact with many people out there, unfortunately. I urge you to keep
it as a button—there’s no click event or handler, no hover accessibility in mind, remembering that small changes are
state or tooltip. important and they do add up.

And of course, I can also do the opposite: When you’re thinking about which library or framework to
use; when you’re considering performance enhancements;
<button role=”heading” aria-level=”1”> when you’re writing a spec doc, designing a mockup, or
My Button? writing a test script—just put the word accessibility out
</button> there. Maybe it’ll force you to reconsider some assump-
tions. Maybe you’ll realize that you don’t need a library at
Again, I’ve overridden the role of a semantic element (in all, because there’s a native HTML element that does what
this case a button), so the accessibility tree believes it’s a you need. And you know what? Maybe nothing will change.
Level 1 heading. You can see why it would be better to have But sometimes it will. In the future, you’ll consider another
no ARIA at all, rather than misleading and just plain wrong aspect of accessibility. Build an accessibility toolbox, just as
ARIA. you would with any skill.

Relationship Maybe you’re thinking that you don’t have time for acces-
ARIA can be used to describe the relationship between two sibility. However, using properly semantic HTML is much
elements. This is similar to setting the for attribute on a quicker than using custom elements, so you’ll likely be able
label, which links the label and the control together, as far to gain some time that way. Then think about all the times
as the browser is concerned. you’ve given an estimate for finishing a task and completed
it quicker than expected. Now you have an extra half hour
If you think about the tab example again, you can use ARIA or more to dedicate to accessibility, and tasks like checking
to link the first element in a list (i.e., the “tab”) to the the contrast ratio, adding alt text to images, and verifying
appropriate div. This tells the browser—and the user, ulti- keyboard navigation take very little time.
mately—that clicking on a specific tab activates or displays
a specific div. Don’t forget about your internal tools and systems: Just
because you’re not selling something doesn’t mean that ac-
Another use for ARIA relationships would be for a descrip- cessibility doesn’t matter. Could you hire a visually impaired
tion of a chart’s data. In this case, you’d display your chart, developer tomorrow and have them use the same tools as
and then have a visually hidden div or span linked via the your current team?
describedby attribute. When a screen reader user reaches
the chart, the contents of the div will be read out because I’ll also ask you to think about physical accessibility, which is
the two elements have been linked together in the acces- still a problem and concern for many disabled people. When
sibility tree. you go to meetups or conventions, ask if they’re wheelchair
accessible (and make sure they’re truly accessible, not “oh,
<canvas aria-describedby=”chartDesc”></canvas> there’s just one small step, I’m sure it’s fine”). Ask if they
<div id=”chartDesc” style=”display:none;”> have accessible and gender-neutral washrooms. Ask if there
This is a description of the data are accommodations for visually impaired, blind, or deaf
in the chart. users. If they’re serving food, ask if there’s a process for
</div> handling allergies and sensitivities.

State Disabled people are already very aware of places they’re


If, for some reason, you decide that you need to create not welcome, which allows those venues to say, “disabled
a custom checkbox instead of using the default one, you people don’t come here, they’re just not interested in what
could use ARIA to expose the current state (checked or un- we provide.” In reality, disabled people just don’t want to
checked). deal with the frustration and possible humiliation of not
being able to access a building or event. However, if abled
<div role=”checkbox” aria-checked=”false”> people start asking these questions, accessibility will be
On presented to the venue as something that’s desirable and
</div> changes (may) happen.
<div role=”checkbox” aria-checked=”true”>
Off I’m here to put accessibility into your heads, and I hope
</div> you’ll do the same with others in your communities.

Please, please never do this! This is an extremely simpli-  Ashleigh Lodge


fied example—you’d need at least five other attributes on 

codemag.com POURing Over Your Website: An Introduction to Digital Accessibility 53


ONLINE QUICK ID 1911101

Best Practices for Data Visualizations:


A Recipe for Success
Over the last decade, many companies debuted data visualization platforms that enable organizations to analyze trends and
understand their businesses better. Tools such as Microsoft Power BI, Tableau, Amazon QuickSight, and QlikView represent just a
few of the many potential applications businesses could leverage. Given that CODE Magazine focuses on computer programming,

many of us easily fixate ourselves on the potential capabili- • Choosing visuals to convey strategic points
ties for creating queries, developing custom programming • Positioning and the number of charts/visuals
code, or creating extensive calculations on the back-end be- • Instruction prompts
hind the scenes. However, the majority of business users use • Colors to point out key trends
these data visualization platforms to dynamically interact
with this top layer of application: the dashboard. To develop Users want you to do the analysis of the components before-
these user-friendly dashboards, we need to think more like hand, but they want to interact with the data themselves
designers instead of programmers. using the instructions you provide. If the pieces don’t fit
together, or if the instructions don’t make sense, building a
Helen Wall successful final product becomes much more difficult.
www.helendatadesign.com The Importance of Dashboard Design
Helen Wall is a power user of The best-designed and implemented dashboards appear The Starting Dashboard
Microsoft Power BI, Excel, and effortless; we like them more, but yet can’t quite explain I chose Tableau for this example because I think it enables
Tableau. The primary driver exactly why. Their likeability comes not by accident, but by a focus on making changes with best design practices in
behind working in these tools mindfully following design principles implemented with the mind, given that the tool focuses on the visuals themselves.
is finding the point where end viewer in mind. Good design isn’t an accidental result, It also works on Apple products at the time of publication
data analytics meets design but rather a strategic decision to choose to make the small for this article, and Microsoft Power BI, unfortunately, does
principles, thus making data design choices that have a huge impact on the end result. not. You can use different versions of Tableau, but for the
visualization platforms both an purposes of this article, I’ll use Tableau Public Desktop,
art and a science. She considers Nudging which is free to download. You can download the starting
herself both a lifelong teacher In order to maximize the likelihood that users make your file from Tableau Public Online to follow along with the
and learner. She is a LinkedIn dashboard part of their everyday processes, you need to changes I make in this article, and then save it by uploading
Learning instructor for Power design dashboards that guide users through not only key it to your own Tableau Public Online account.
BI courses that focus on all as- visuals and figures, but also how they collectively interact
pects of using the application, with each other. A dashboard can check all of the user’s I obtained the infant mortality data from the impressive
including data methods, dash- requirements, yet still not yield much value to the user be- data section of the Gapminder website developed by the
board design, and programming
cause there’s a dissonance between what the y tell you they late Hans Rosling. The data set contains the infant mortal-
in DAX and M formula language.
would like and how their actual behavior responds to the ity rates by country and by year from 1800 until 2015. Note
Her work background includes
an array of industries, and in
dashboard. How do you bridge this gap? Let’s put this dis- that there are incomplete data sets because some countries
numerous functional groups, cussion in the context of the nudging proposition. may not have data (or at least useable data) for all of these
including actuarial, financial years. I categorized each country into its own region of
reporting, forecasting, IT, Nudging is defined as tiny prompts that alter our behav- the world, as you can see in the region mapping key file in
and management consulting. ior—specifically social behavior. Richard Thaler extensively Figure 1.
She has a double bachelor’s examined nudge theory in his book Nudge. Examples of
degree from the University of nudging techniques include: The Gapminder site defines infant mortality as the number
Washington where she studied of deaths in the first two years of life for every 1000 live
math and economics, and also • Encouraging recycling, by placing a larger recycling births. Although you may think this project is taking a mor-
was a Division I varsity rower. bin in a more prominent location than a smaller gar- bid direction, you’ll see that the Tableau dashboard helps
On a note about brushing with bage bin in many cities and businesses. communicate a much more positive outcome. A decrease in
history, the real-life characters • Sending out electricity bills that compare usage to these rates means that the survival rates are increasing and
from the book The Boys in the that of neighboring housing units to encourage users improving global health outcomes.
Boat were also Husky rowers to limit their electricity usage.
that came before her. She • Charging even the nominal amount of a few cents for To walk through the steps of applying best visualization de-
also has a master’s degree in single-use plastic bags encourages people to bring sign practices, let’s begin with a less-than-optimal Tableau
financial management from their own reusable bags when shopping or not use one dashboard I created, as seen in Figure 1. I can make stra-
Durham University (in the at all. tegic design and formatting changes that transform it into
United Kingdom).
a more effective dashboard built with the end user in mind.
Much like Ikea furniture comes in a box with pre-cut pieces
and instructions for assembly, you want to present your You will need several components to try this out on your
dashboards to the user in a similar manner. Think of the in- own, including the Tableau Public dashboard links below
struction manual as the nudging component to the product. and the link for the two Excel files and the PNG image that
This translates to techniques in designing data visualiza- are available on the CODE Magazine page associated with
tions (which I’ll discuss later) such as: this article.:

54 Best Practices for Data Visualizations: A Recipe for Success codemag.com


Figure 1: Initial infant mortality dashboard

• Starting Tableau dashboard: https://public.tableau. and they enjoy the process, you’re telling them that you
com/views/VisualizationBestPracticesstartingfile/ value their ability to analyze the data trends and take own-
Dashboard1?:embed=y&:display_count=yes&:origin=viz_ ership in this process. Psychologists define this as the “Ikea
share_link effect” (https://www.bbc.com/worklife/article/20190422-
• The Excel file from Gapminder data with the infant how-the-ikea-effect-subtly-influences-how-you-spend)
mortality rates where the customers (in this case dashboard viewers or
• The Excel file for country to region mapping users) feel they achieve the greatest value for their invest-
• The Gapminder logo ment. This is the Holy Grail for many businesses.
• Ending Tableau dashboard: https://public.tableau.com/
shared/8TZ3K2TWW?:display_count=yes&:origin=viz_ On the flip side, you need to do a lot of work on your end to
share_link get the users to feel this empowerment and ownership in the
process. Making the process easy for the user involves putting
You need to update the Tableau file to point to the Excel file yourself into their thought patterns to analyze the unknowns
on your own dashboard: and numbers before they even see them in the dashboard.
These areas for you to analyze beforehand for them include:
1. Download the Tableau Desktop Public application (free ver-
sion) or you can use Tableau Desktop if you already use that • The meaning and magnitude of data set numbers
2. Download both Excel files and the Gapminder logo to a • The relationship between data points and data fields
folder in your own desktop or documents folder. • Optimal ways to see the data in visuals and charts
3. Open up the Tableau link for the starting dashboard
with your own Tableau application How do you want to measure the infant mortality rate? Does
4. Go into the Data Source tab of the Tableau file, click on a higher rate indicate a better or worse metric? You need
each of the connections for the rates and region key, and to establish that you want to see lower infant mortality
set the folder connection to the path on your own desktop. numbers because this means that more babies are surviving
out of infancy, which also indicates improving public health
After you update the sources, the rest of the visualization will outcomes. You can’t assume that the reader already knows
update as well, and you can begin the transformation process. this, and you need to explicitly say what the numbers mean
in context of the bigger picture.
The “Ikea Effect”
Looking at Figure 1, can you tell at first glance what the Furthermore, you also need to indicate how you’re aggregat-
initial dashboard is analyzing? Inconspicuous legends and ing these infant mortality numbers. Each point in the data
axis labels serve as the only indications that you’re studying source represents the infant mortality for a given year and
infant mortality rates. The user shouldn’t have to guess at country. If you wanted to determine the global infant mortal-
what you’re trying to do. ity rate for 1990, for example, you need to analyze the data
points for all of the countries that year. It doesn’t make any
Building successful data visualizations involves striking a sense to sum them together because they represent rates and
balance between giving dynamic options to the viewer and not absolute values. It makes more sense to average all of
your own design and analysis process. When viewers feel like these data points to get this global mortality rate the years
they do most of the work by interacting with the dashboard and countries we want to see as an aggregated number.

codemag.com Best Practices for Data Visualizations: A Recipe for Success 55


Figure 2: Updated bar chart

If you buy furniture from Ikea, you assemble it yourself Effective chart options include:
from pre-cut pieces that come in a box that designers
planned out and tested ahead of time. Similarly, in dash- • Bar charts
boards, you want to analyze the data and plan out the • Line charts
dashboards before passing it off to the user to interact • Scatter plots
with in a pre-packaged box in the form of a dashboard. If • Box and whisker plots
you don’t include a necessary piece or if the sizing doesn’t • KPI metrics
work, neither you nor the viewer get the desired finished • Heat maps
product or result. Much like Ikea furniture comes in a box • Highlight tables
with pre-cut pieces and instructions for assembly, you
want to present your dashboards to the user in a similar You can see the infant mortality rates by region represented
manner. Users want you to do the analysis of the com- as a pie chart in the upper left-hand corner of Figure 1. This
ponents beforehand, but they want to interact with the chart presents two big issues when you view it:
data themselves using the instructions you provide. If the
pieces don’t fit together or if the instructions don’t make • You can’t easily distinguish between the slices of the
sense, the likelihood that they will embrace this dash- pie because you have to guess the angles rather than
board as their own goes down substantially. actually measuring the numbers.
• The chart represents the infant mortality rate as a sum
of the rates, which can mislead the audience because
Good Design Isn’t an Accident regions with more data points and complete data will
How do you approach designing your best possible version have a bigger slice, even if they have lower infant
of the dashboard? What do you consider for the job at hand? mortality rates.
You want to analyze the data initially to create components
for the dashboard that fit together with each other. Much The bar chart (Figure 2) does a much more effective job
like an Ikea furniture pack, you design and test the pieces of showing the average infant mortality rate by region
to make sure they fit together beforehand. (changed from the sum aggregation in the pie chart), and
you can easily rank and compare these rates between re-
When you like the way something looks, you can’t always gions directly within the chart.
quite explain why. The “why” comes through your decision
to strategically choose to apply design principles to it. De- To change a pie chart into a horizontal bar chart:
signing an effective dashboard that users embrace interact-
ing with is not an accident, but rather a well-planned ap- 1. Move the “Region” dimension to rows.
proach that keeps the user in mind. 2. Change the chart from a pie chart to a horizontal bar
chart in the Show Me options menu.
Choose the Right Visuals for the Job at Hand 3. Change the aggregation of the infant mortality rate
The first step in this process is selecting visuals that repre- from Sum to Average.
sent the data correctly, and also effectively communicate
the results and trends in the data. There’s no one chart that You also lose the time dimension this way because you’re mea-
works for all data and no one data set that works for all suring the average infant mortality rate for all years for coun-
charts. I encourage you to experiment with charts within tries within each region in Figure 2 rather than in a certain year.
the data visualization application to compare how they
represent the data and what visual works best for your in- Because you want to measure a time value, you can use a
tended result. line graph or a bar graph. What if you took this infant mor-

56 Best Practices for Data Visualizations: A Recipe for Success codemag.com


tality data by region and put the trends in a bar chart us- To convert from a vertical bar chart into a line chart: In the
ing a time dimension x-axis? You can see the results of this Show Me options menu, select the line chart. You can see
visual in Figure 3, where each region is distinguished within the updated chart in Figure 4.
each year on the x-axis with a color as well as a label.
The world map you saw in Figure 1 represents the infant
This chart option still poses some issues, including: mortality rate by country, with the color representing the
region, and the size of the bubble representing the sum of
• Because you now have both the region and time on the infant mortality rate for that country across all years.
the x-axis, the axis becomes very long and difficult to Notice that the map shows the bubble size as the sum of the
read. Even adjusting the fit, you still need to process infant mortality rates, which misleads the viewer, so you’ll
a lot of data points. need to update the aggregation to average across all years
• If you compare the trends by year for a certain region for each country. You may also find it difficult to distinguish
(say Europe), how can you quickly tell if the infant between the sizes of the bubbles or to determine trends or
mortality rate improves from the previous year? discrepancies within a region because the bubble sizes are
small to start with on the dashboard.
To change from a horizontal bar chart (Figure 2) to a verti-
cal bar chart (Figure 3) with time on the x-axis: I changed the map type to the filled map you see in Figure
5, where the darker colors represent higher infant mortal-
1. Movethe Regions dimension to columns and the aver- ity rates over a two-hundred-year range, and the lighter
age Infant Mortality aggregation field to rows, and colors represent lower infant mortality rates. You can also
you now see the chart automatically update. see higher rates concentrated among neighboring countries
2. Add Years to the columns in front of the Regions, and in sub-Saharan Africa. Notice that the visual automatically
you now see a bar chart with a very long x-axis. dropped the region dimension completely from the map vi-
3. Take the Regions dimension from the fields and place it on sual. The filled map makes it easier to compare rates be-
the Color Marks card, and you now see regions in two plac- tween neighboring countries because you can see color dif-
es of the chart: one for the x-axis and one for the color. ferences much more easily than bubble-sized differences.

You can’t stack up the region bars in Figure 3 because you To change from a bubble chart (Figure 1) into a shaded
want to average rather than sum the rates. Showing the filled chart (Figure 5):
average infant mortality rates by region as a line chart miti-
gates the size and readability issues that you encounter for 1. Select the Show Me option menu and pick the maps
this scenario with bar charts. chart icon
2. Change the aggregation from Sum to Average for the
The line chart in Figure 4 allows you to easily rank the regions Color Marks card option.
for each year, and you can see the infant mortality rate trends
by region because each point in the graph joins to the point for Now I’m going to tackle the two charts you saw in Figure
the next year and the previous year and so on. More importantly, 1 that show the infant mortality rates by country as a bar
you can also easily see that the rates trend downward across all chart and the average infant mortality rate by year as a data
regions, which means that global health outlooks are improving, table by combining them into a single visual rather than
even if you continue to see disparity among the regions. two, which makes it easier to easier. You can see some key

Figure 3: A vertical bar chart over time and region

codemag.com Best Practices for Data Visualizations: A Recipe for Success 57


pieces of information in both graphs, such as easily iden- for each of these coordinates, it doesn’t technically matter
tifying the countries with lower infant mortality rates and whether you select sum or average as the aggregation.
also that the infant mortality rates go down substantially
over the last fifty years. You also need to ask yourself if you think that the table
visual in Figure 6 serves as the most effective way to easily
Figure 6 shows a data table summarizing the infant mortal- view and analyze the data. Although you may find it nice
ity rate with the countries on the row labels, the years in to more easily see with numeric values how the rates are
the column labels, and the corresponding values effectively improving for each country over a two-hundred-year time
as cell coordinates in the middle. If you looked around the frame, looking at a lot of numbers without visual cues or
Gapminder Excel file storing this data, you might notice that assistance can get fatiguing.
it looks like the original data set with the countries listed
alphabetically in the rows and the years listed chronologi- You can still use the idea of a table but instead, what if you
cally as columns at the top. Because there’s only a single rate use a highlight table instead, as you see in Figure 7? This

Figure 4: A line chart over time with colors for region

Figure 5: Filled map

58 Best Practices for Data Visualizations: A Recipe for Success codemag.com


Data Aggregation
Options
Within data visualization
platforms like Tableau, you
Figure 6: Infant mortality rate by country and year in a single table can aggregate data through
functions such as counting,
count distinct, sum, average,
minimum, or maximum.

Selecting an aggregation
option allows you to analyze
trends in the data set.

Figure 7: Highlight table

visual shows both the rate as a text value and a color, and you 3. In the values area (or the Text Marks card), you have an
can see that the color effectively illustrates an improved in- average of infant mortality rate. It doesn’t matter if you
fant survival rate in recent years for all countries in this view. use sum or average here because for each row and col-
umn coordinate in the data table, you only have a single
Now change it into a single table (Figure 6) with rows and corresponding value, but it makes the most sense to just
columns combined with the data in the chart, and then into select the average aggregation to line up with the other
a highlight table (Figure 7): visuals. If you inspected the Excel file, you may remem-
ber this is what the data table looks like.
1. Move the Year dimension from rows to columns. 4. To convert to a highlight table, select the highlight
2. Add the Country dimension to the rows, so that you table icon from the Show Me menu. If it switches the
now have a data table with years in the column labels rows and columns when you convert the visual type,
and countries in the row labels. just move them back into the correct positions.

codemag.com Best Practices for Data Visualizations: A Recipe for Success 59


Figure 8: Updated highlight table with year bins

2. Convert this new field from a measure to a dimension


by right-clicking on the new measure and selecting
Convert to Dimension.
3. Now, right-click on this new dimension Year (YEAR),
select Create > Bins, and a new dialog box will open up.
4. Use 10 (as in ten years) for the size of the bin and keep
the name as Year (Bins).
5. You can now see a little histogram icon next to the field
Figure 9: Years measure calculation name in the dimensions list (see Figure 10).
6. Now add the new year bin to the data table next to the
Year dimension in the column shelf, and remove the
Notice that the empty values show colors, which you don’t Year dimension because you no longer need it.
want to see. You’re not done with this visual yet, and it won’t
look like this after you finish making modifications to it. Sort the Labels
The highlight table you created (shown in Figure 10) shows
Use Bins to Simplify Visuals the trends by country, where you determine the order of the
Notice in the data table you just created in Figure 9 that be- countries in the labels simply by their alphabetical order. You
cause you’re measuring rates over more than two hundred might find this a helpful set up if you want to easily find Bel-
years, you can’t see all the years without having to use the gium in the list for example, but you can’t identify the coun-
scroll bar. You also already know that you don’t have consis- try with the lowest rates without having to navigate through
tent data back to historical periods farther in the past and the list. I think it makes the most sense to put the country
even across contiguous time spans. Averaging the infant mor- with the best outcome since 2010 at the top and rank the rest
tality rate across ten-year time segments rather than a single of the countries as they fall into the subsequent order for
year using bins in Tableau creates some noted advantages: infant mortality rates, as you see in Figure 11.

• If a country has gaps in a time range, averaging out To sort the country order:
the rates within a ten-year segment allows you to
smooth out those inconsistencies. 1. Go to the last column where the 2010 through 2015
• It also makes it possible to see the entire time range years aggregate and click on the header. You’ll see the
in a single view, as you can see in Figure 8. I chose to sorting icon that looks like a little bar chart appear.
use ten years because it allows me to have enough ag- 2. Click on the little horizontal bar chart icon once, where
gregated rates within a reasonable time range without you see the highest rate for Angola, then click on it
missing too many data points. again to see it listed by lowest infant mortality rate

To create bins for the years and update the highlight table Make the Labels Easy to Read
(Figure 8): If you cut off the label names in a visual, do you expect the
viewer to fill in the missing letters and guess the name?
1. Add a newly calculated field for the year and enter the In Figure 11, you can’t see the entire country name for
formula Year (YEAR) = YEAR([Year]) (see Figure 9). the healthiest country. Even if you know to add an “in” to

60 Best Practices for Data Visualizations: A Recipe for Success codemag.com


Figure 10: 10-year bins dialog box

complete “Liechtenstein,” that may not be an option if you You also add the year filter to the filled map chart as you
have more than two hundred or so options (country names) can see in Figure 15, with the option to select multiple or
to guess the finished name outcome, which Figure 12 will all years within the view rather than showing the average
spare you the pain of having to do. infant mortality rate across all available years. This allows
the viewer to create a custom view of the map based on their
To expand the size of the country column: selected year, and dynamically update the colors on the map
that represent the infant mortality rates averaged across
1. Hover over the border between the country name and the selected year or years.
the values section until a double arrow appears.
2. Drag the arrow until the column width expands to com- To add the Years filter:
fortably fit the country names in the immediate view,
as you see in Figure 12. 1. Go into the map worksheet and put a filter on the filters
shelf. Figure 11: A table sorting by
You can also wrap the text fields, but I wouldn’t recommend 2. Select all the years, and then select to show the filter. lowest mortality rate in 2010
that approach for the highlight table because it increases 3. Setting this up as a drop-down list has many benefits, to 2015 to highest rate
the height of these wrapped fields and throws off the sizing including taking up less space. I’d also recommend the
for the entire visual, and can make it more difficult to read. drop-down list because you can see that the single list
takes up a great deal of space and doesn’t do much
Use Filters Within Visuals for you.
The line chart you created in Figure 4 looks like colored
spaghetti lines over a two-hundred-year time frame, which The drop-down list is shown in Figure 16.
can make it slightly difficult to analyze. Because incomplete
data drives much of this fluctuation, if you want to use this I’ll revisit the filter options in much more detail later when
line chart visual as an effective analysis tool, it seems sen- you set up the dashboard.
sible to filter down the chart to only show the trends from
1950 and onward, as you see in Figure 13. Applying Color Effectively
In Tableau and many other data visualization platforms,
To adjust the line chart: the application automatically assigns a color palette to the
chart. However, using the default option may not present
1. Go to Sheet 1 and add Years to the Filters Marks card. the best color scheme. To leverage color effectively, you
2. Select Years and the condition as greater than or equal want to limit the color scheme you use (as contradictory as
to the 1st of January in 1950, as you see in Figure 14. that sounds) because strategically applying just a few colors
3. Filter out the null region from the table (remember not only makes the visuals easier to read, but also allows
these are countries that you don’t have a matching users to focus on key trends and numbers.
region for because they are so small) by dragging the
Region dimension to the Filter card. Color-blindness is a visual disability that affects the eyesight
4. Also filter out nulls by excluding them from the data by abilities of one in ten men and a smaller group of women. You
clicking on the null values at the bottom of the chart may be color-blind yourself or work with someone who is, or
and selecting Filter data. you may not even realize that this impairment is among your

codemag.com Best Practices for Data Visualizations: A Recipe for Success 61


Figure 12: Adjusting column widths for country label names

Figure 13: Filtered line graph visual

colleagues and peers. If you look at the diagram in Figure 17, You update the filled map in Figure 15 to use a diverging
you can see how orange and blue navigate around issues with Orange-Blue color scale with a very light gray serving as the
vision impairment pretty easily. Green and red, on the other color representing the midpoint. Although blue represents
hand, look the same for those with color-blind impairments. lower infant mortality rates and orange represents high infant
Also, remember that, like, many other disabilities, it occurs in mortality rates, the key part about setting up this color scale
a spectrum rather than an absolute impact. is selecting what value to use as the midpoint (Figure 20).

Accounting for those with color-blindness can serve as a In 2015, the country of Angola experienced the highest infant
starting point to selecting your own color palette. Tableau mortality rate of all the countries, at 96 deaths in the first two
has a color palette that you can see in Figure 18, specifically years of a baby’s life for every 1000 live births. I decided to set
offering ten color options to choose from. the center or midpoint of the color scale to this rate because
it allows the viewer to put in context how historical infant
You want to give a unique color for each region so that even mortality rates across all countries compare to today. It allows
those who are colorblind can distinguish the regions in the the viewer to see that although unfortunately Angola still lags
line charts, as seen in Figure 19. behind other countries in population health in today’s world,

62 Best Practices for Data Visualizations: A Recipe for Success codemag.com


it still represents an improved infant survival from historical You can then apply this same Orange-Blue color scale pal-
infant mortality rates across a two-hundred-year time span as ette to the highlight table from Figure 12 to better analyze
you see in Figure 20, including a lower mortality rate than a two aspects of the infant mortality rate data: rankings be-
developed country, like Germany, saw historically. tween countries and improved health outcomes over time.
You use the same midpoint as for the filled map with An-
To change the colors on the filled map (Figure 21): gola’s 2015 infant mortality rate of 96.

1. Go to Sheet 2 and click on the Color Marks card. As you see in Figure 22, although all the countries in the most
2. A dialog box opens up where you select the Orange- recent time frame of 2010 to 2015 have better infant survival
Blue color scheme, and then select to reverse the col- rates than Angola (indicated with the blue cell color), when
ors so that blue indicates lower rates and orange indi- you look back at historical trends for these rates, some of the
cates higher rates (Figure 20). most developed countries today, like Japan and Singapore,
3. To change the midpoint that the color scale uses, go to had higher infant mortality rates only fifty years ago than
the Advanced options and put a check mark next to the Angola does today. Even developed European countries, like
center options, where you can now type in 96 as the France, Germany, and Austria, also have much lower rates.
value to center the color scale. Although you may lament about the difference between the

Figure 14: Setting up a Year filtering condition

Figure 15: Adding years filter to map

codemag.com Best Practices for Data Visualizations: A Recipe for Success 63


Figure 16: Setting up a Years filter on the filled map:

how all of these visuals come together in a single consolidated


dashboard with which the viewer can dynamically interact. After
making several strategic design and formatting decisions for the
visuals, you end up with the dashboard you see in Figure 23.

You want the dashboard to line up in a way that’s easy to


read by purposely deciding:
Figure 17: Normal vs. color-blind vision • To make all of the chart big enough to view without
scrolling or panning in.
• To place visuals, such as the highlight map, to the side
healthiest and the sickest countries today, you need to re- to effectively create a border.
member that these outcomes still represent a much healthier • To use a maximum of three large visuals to avoid clutter.
world for everyone now than less than half a century ago.
You now need to make adjustments to get everything to
In highlight table, to update the color scheme (Figure 22): fit together after updating each visual to get the updated
dashboard, as you see in Figure 24.
1. Click on the Color Marks card to open up the selection
options. To make these changes:
2. Select Orange-Blue diverging, reverse the color scheme,
and then, on the Advanced options, select Center and 1. Remove Sheet 4 from the dashboard by selecting the
put in 96, which is the infant mortality rate for Angola visual so you can see a box around it and then click on
in the most recent year, 2015. the X in the upper right-hand corner.
3. To remove the extra blue spaces for the null values, right 2. Take the Region key on the upper right-hand side and
click on the mortality rates with the colors and select drag it over so that you can see it on the top of the line
Filter. Then select the Special options by choosing the chart, and then adjust it to make it smaller. You can
button on the far right and selecting Non-null values. also adjust the width of the region names by clicking
4. You also want to remove the text values, so select the into the legend and when you see an arrow pop up,
second infant mortality aggregation with the text icon drag the width of the text field to where you want.
next to it and delete it. 3. Now drag the infant mortality color gradient to under-
neath the map and to the left of the highlight table.
If you run into problems where Tableau won’t let you remove Drag the top of the scale up to put the label above it.
the color from the null cells (I ran into this a time or two when 4. Remove the layout container from the right-hand side
testing), I suggest trying to recreate the visual again or going of the screen where the legends used to be by clicking
back to previous steps and removing the null values at that on it, then clicking on the X.
point (you may have to test different options). If the visual 5. To make the highlight table bigger so you can see it
flips the rows and columns but removes the nulls, you can in a bigger picture on the dashboard, select the con-
easily flip it back into the positions you want. tainer and pull it over until it almost takes over half the
screen on the right-hand side.
Placing the Visuals on the Dashboard 6. To get the highlight table to fit on an entire view, select
So far, you’ve updated the chart type and formatting of indi- the chart, then click on the down arrow at the edge,
vidual visuals, but you now need to take a step back and see select Fit and choose Entire View.

64 Best Practices for Data Visualizations: A Recipe for Success codemag.com


Put Titles and Labels on the Dashboard option, and the floating option frees up even more space to use
Although you already made strategic design and formatting for the rest of the visuals on the dashboard.
decisions to improve the dashboard, you still need to effec-
tively label these components or visuals so that the viewer You’ll want to add:
understands relatively quickly exactly what each component
represents, as you see in Figure 25. Although you may know • A title to the entire dashboard to explicitly tell the
what Sheet 1 does because you designed the visual, you viewers what data they’re working with and give them
can’t assume that the viewer does as well. a nudge to indicate that falling infant mortality rates
represent improved health outcomes, and also to give
You can also add the year filter to the map chart that allow you them context on the trends and encourage them to
to see the infant mortality rates around the world for the se- learn more.
lected year or years. This allows the user to dynamically change • Titles on individual visuals, where needed, to provide
their own map view in the dashboard themselves, as you saw in context on what they see and how to potentially think
Figure 25. The drop-down list takes up less space than the list about the results and interact with the data.

We Prefer to Eat Pie


Rather than See It
in a Visual
Although pie chart visuals
allow you to see the rough
breakdown of the “pieces
of the pie,” you can’t easily
compare the size of the
actual pieces within just a
few seconds. Can you tell
more than which piece of
the pie chart represents
the largest value?

If you want to know the totals


and rank of the aggregated
numbers and you can’t easily
do that analysis, you need to
use another chart. You can
easily perform this endeavor
Figure 18: Color-blind palette in a bar chart. If you’re
analyzing two numbers,
you can put the totals and
their percentages in a small
table or even a KPI metric,
which saves space and
becomes easy to read for
a small amount of data.

Figure 19: Updating region key colors

codemag.com Best Practices for Data Visualizations: A Recipe for Success 65


• The Years filter that enables you to see different views 2. Make the title in size 14 font and the subtitle details
of the filled map by selecting the years you can see. in size 10.
3. Add names to the visuals. Double click on the title for
Determine whether you can remove some titles, too, such as Sheet 1 and in the dialog box, enter Since 1950, infant
the map title, because you already know it’s a map mortality trends across all regions trend downward, and
set the font size to 12.
The steps to add titles are: 4. Next, double click on Sheet 2 and delete all the text in
the dialog box so you no longer have a title on the map.
1. In the Dashboard 1 tab, select the Dashboard menu at 5. Double click on Sheet 3 to change the name of the
the top, select Show Title where you can input the name highlight table to All countries show improvements
Infant Mortality Rates Trend Downwards for ALL Coun- in the infant mortality rates, from the healthiest
tries Over the Last Half Century and the subtitle Gap- countries to the sickest and set the font size to 12.
minder data helps us prove that the world is getting 6. Select the map visual container, then click on the down
better for almost everyone (see Figure 26). arrow on the outside frame, select Filters, and select

Figure 20: Orange-Blue diverging color palette

Figure 21: Diverging color palette filled map

66 Best Practices for Data Visualizations: A Recipe for Success codemag.com


Figure 22: Blue-orange heat applied to highlight table

Figure 23: The updated dashboard

the Year of Year from the list, where you now see the down, which indicates an improved population health out-
filter on the far right. look. Similar to the way you set up the blue-orange color
7. Select this filter container, click on the down arrow, scale for the filled map and highlight tables in Figure 21
and choose Multiple Values (dropdown). Then click on and Figure 22 with the highest mortality rate in 2015 as
the down arrow again, select Floating, which means the midpoint in the diverging color scale, you can also use
that you can move the filter over the map to select the a reference line as another way for the viewer to analyze
year, and you can drag it over to the bottom of the map these rate trends. In 1950, you can see that the healthiest
where it doesn’t directly sit on top of any countries in region, Europe, had an infant mortality rate of 58.9 as you
the map. see in Figure 27. By setting a constant reference line at this
point on the y-axis, you can see that although other regions
Adding Elements of Analysis may lag behind Europe in terms of relative improved health
In the line chart in Figure 19, you saw that since 1950, outcomes, you can use this number as a benchmark to show
average infant mortality rates across all regions trend what you could call a time delay in this trend

codemag.com Best Practices for Data Visualizations: A Recipe for Success 67


To add a reference line to the line chart: how to enable the user (the customer) to experience the “Ikea
Effect” discussed earlier, where they do most of the work by in-
1. Click on the y-axis and right click to Add reference line. teracting with the dashboard, but still feel that their investment
2. Select the entire table, select a constant line, enter the in using the dashboard was worthwhile. Like a furniture pack
value of 58.9, choose to see no label, and then use a with components ready for assembly, as the developers of the
thick dashed line, as you see in Figure 28. dashboard, you analyzed, measured, and packaged the pre-made
components for the customer before they even receive them.
Use Tooltips as a Hidden Design Weapon
Tooltips allow you to increase the capabilities within inter- Now you need to communicate the instructions for how to
active data visualization applications because you can hide assemble the product by telling the user how to interact
some of the information away from the immediate view of the with the visuals within the dashboard. You can’t assume
user, but they pop up when the user scrolls over the relevant that because you know how to interact with the data that
data points in a visual. I sometimes think of it like a third they will as well. We need to provide clear instructions that
dimension that you can add to a two-dimensional dashboard. guide them in how they can change filters or click on coun-
You can add data to tooltips that you don’t see in the visual tries in the map to change the dashboard view.
as well. You can also customize the wording and structure of
the tooltips, as seen in Figure 29. To help the viewer navigate the dashboard and make the most
of using the dashboard, you should use nudging techniques to
In the highlight table, I find it difficult to read the row and col- give them instruction prompts for how to use the visuals. These
umn headers in the visual because there are so many of them nudging techniques appear as instructions in the visual or fil-
in a small space. By pushing the details into the tooltips, as ter title to gently guide them with their selection options and
you see in Figure 30, you can ultimately format a clean visual encourage them to explore the dashboard. You want to avoid
without compromising the design or details behind it. wordy instructions or difficult procedures to follow. You need
to simply tell the viewer what you want them to do, without it
To edit the text and values within the tooltips: coming across in an authoritarian way. Examples include nudg-
ing instruction cues that you see in Figure 31 include:
1. Click on the Tooltips Marks card that opens up a new
dialog box on the Sheet 1 tab for the line chart. • Adding the Gapminder logo by selecting on the Dash-
2. Edit the tooltip to refine and summarize what you want board tab of the pane on the far left-hand side of the
to say (you can type in the tooltip box to rename the screen, and then selecting Image from the options at
labels or create sentences). Change the year for the bins the bottom, where a new container opens up in which
to Decade and delete Avg from the mortality rate de- you can select the image path in the dialog box.
tails. You can also change the details in the map to make • The logo now appears in the dashboard but looks
them easier to scroll over. strange because of the position it currently resides
in, so you need to highlight the container and move
Creating Interactivity Through Instruction Prompts it to the top left-hand corner of the screen before
Businesses decide to leverage data visualization tools like Tab- the title details. This can be quite tricky, so put the
leau because of the interactive capabilities that enable the end logo in with a container and move the dashboard
users to explore and analyze data trends. You want to think about title into the blank container.

Figure 24: Dashboard with fitted visuals

68 Best Practices for Data Visualizations: A Recipe for Success codemag.com


• Selecting a year from the filter to change the map in the highlight table. Make this addition font size 8
view. so the users can see it, but it’s not too prominent.
• Clicking on a region to filter the entire view. 2. In the filter for the map, double-click on the year filter,
• Thumbing over a color cell block in the highlight table and change the title to say Select year to see trends in
to see more details. map. You can also double-click on the legend below to
update the Avg in the legend title to Average.
To add instruction prompts and the final formatting details 3. Now in the highlight table, double-click on the title
to the dashboard (Figure 29): and add the text Hover over a cell to see the country,
year, and infant mortality rate. Again change the font
1. In the title of the line chart, double-click to edit the size to 8.
title, and, underneath the title, add the instructions by 4. You also want to remove the labels from the highlight
entering the text Select a Region from the legend to table because you can’t even read them in the first
see the related countries in the map and their trends place. Right-click on both the column labels and the

Figure 25: Dashboard titles and visual titles

Figure 26: Adding dashboard titles

codemag.com Best Practices for Data Visualizations: A Recipe for Success 69


row labels separately and choose to remove labels for the chart to see the entire logo as well as the other visuals
both of them. without them pushing one another out.
5. Also remove Sheet 4 because the dashboard doesn’t 7. Now you need to set up the interactivity between the
use it as a visual. charts so the instruction prompts work. To do so, go to
6. You can add the Gapminder logo to the dashboard by first Sheet 2 and make sure to add the Region to the Details
putting an empty container (found on the bottom left) Marks card, so that the region legend can filter this map.
and dragging it onto the canvas. Push it into the position Do the same to the highlight table by going to Sheet 3
next to the dashboard title so they share the same space. and adding the Region to the Details Marks card.
Now keep this layout container selected and choose Im- 8. Now go back to the dashboard and select the line chart
age from this same selection option box, and point to the to highlight its container. Select the filter icon to set
location where you saved the Gapminder logo. Now adjust up this chart as a filter for the other charts.

Figure 27: Updated line chart with reference line

Figure 28: Reference line dialog box conditions

70 Best Practices for Data Visualizations: A Recipe for Success codemag.com


Figure 29: Tooltip dialog box

Figure 30: Highlight table with updated tooltips

9. Make sure that you can see all your visuals. The best Now that you’ve set up the dashboard, you can put yourself in
way to do this is to make sure to leave enough white the position of the user and test it out. In Figure 32, you see
space between the legend fields and the visuals, and Asia selected as the region in the line chart legend. This creates
then upload to Tableau Public Online to make sure a new view of the data that highlights key trends and analysis
it fits as you anticipated. If it doesn’t, go back to the for the Asia region. You can also select a single year in the map
dashboard and make adjustments based on what you to see the Asian countries’ health for that year. Choosing Asia
saw pushed out of place. This make take a little bit of filters all three charts, and you can see in the highlight table
practice! the disparity between the rankings within the Asian countries.

codemag.com Best Practices for Data Visualizations: A Recipe for Success 71


Figure 31: The final dashboard with instruction prompts

Mitigating the Blind Spots of


Those with Color-Blindness
Roughly one in 10 people
experience visual color-
blindness and you need to be
mindful of them. Some of you
reading this are color-blind.

Although it can seem


overwhelming to create a
color palette specificly for
color blind viewers, you can
follow a few simple rules to
avoid falling into problems.
As a rule of thumb, I avoid
green and red together,
and instead substitute orange
and blue for heat maps,
for example.

Figure 32: Filter entire dashboard for Asia region

You maximize your dashboard’s influence by taking the ini- I encourage you to take these best practices techniques
tial Tableau dashboard and strategically making changes and use some creativity to set up and experiment with vi-
to update the design, formatting, and interactivity. This, sual options, then see how the users respond and go from
in turn, increases the user’s understanding and interac- there!
tions with the dashboard interface and the likelihood they
will use it by letting them take ownership in changing the  Helen Wall
views. Designing an effective dashboard gives flexibility. 

72 Best Practices for Data Visualizations: A Recipe for Success codemag.com


CODE COMPILERS

(Continued from 74) with household chores. And like them, managers
should always look for opportunities to do the
should consider the kind of example you’re set- grunt work, if only to remember how hard and
ting for your daughter.” mind-numbing it can sometimes be. Nov/Dec 2019
Volume 20 Issue 6
And so Maria heads off with Carol and the rest Plus, said Suzanne, an administrative services
of the gang. That’s a clear-cut case of a manager officer for a county government (and my wife), Group Publisher
allowing her people to “lead up,” to make the big “Doing tasks with people can sometimes lead to Markus Egger
decision. For growth to happen, those at every greater strength of relationships for the people Associate Publisher
Rick Strahl
level need to exert influence on the people above you need to have follow you.”
them in the organizational chart and managers Editor-in-Chief
Rod Paddock
can help them do that by responding favorably
to their ideas. Compile Your Playlist Managing Editor
Ellen Whitney
Heck, yeah, few jobs are as hard as managing
Content Editor
people. But music can help you, like nothing
When You Fall Down, else, push through the limitations, recommit to
Melanie Spiller

Get Back Up the mission, and inspire your team to keep up.
Editorial Contributors
Otto Dobretsberger
Failure hurts. In Carol’s case, that includes crash- What’s on Captain Marvel’s playlist? For a few, try Jim Duffy
ing a go-kart as a kid, falling off a climbing rope as Heart’s “Crazy on You,” No Doubt’s “Just A Girl,” Jeff Etter
Mike Yeager
a young woman, and putting up with another pi- and Des’ree’s “You Gotta Be.”
Writers In This Issue
lot—this one male—in a bar, over a beer, telling her Sumeya Block Sara Chipps
she’s a “decent pilot, but [she’s] too emotional.” With tunes like that and a little work honing that Kate Gregory Julie Lerman
inspirational stance, Danvers, I’d think about Ashleigh Lodge Sahil Malik
Later on, these scenes re-emerge, but this time we signing onto your team. Jeannine Takaki-Nelson Dian Schaffhauser
Craig Shoemaker Helen Wall
get to see how those scenes play out, with Carol
Technical Reviewers
picking herself up after every failure. Sure, it’s a Dian Schaffhauser Markus Egger
montage just like the ones Nike feeds us, but those schaffhauser@gmail.com Rod Paddock
commercials get a bazillion views because they Her management days over, these days, Production
work. Managers don’t give up; they get up. Dian Schaffhauser prefers to go it alone as a Franz Wimmer
King Laurin GmbH
freelance reporter covering business and 39057 St. Michael/Eppan, Italy
“Failure, failure, failure, failure,” said Staci. “You technology from Northern California.
keep getting up and that will get you closer toward Printing
Fry Communications, Inc.
your goal.” 800 West Church Rd.
Mechanicsburg, PA 17055

Don’t Let Anybody Advertising Sales


Tammy Ferguson
Tie Your Hands 832-717-4445 ext 26
tammy@codemag.com
Throughout Captain Marvel, Carol is handicapped
Circulation & Distribution
from using her full powers. In that opening fight General Circulation: EPS Software Corp.
scene, she complains to Yon-Rogg that he won’t Newsstand: The NEWS Group (TNG)
let her use her special energy waves, and he insists Media Solutions
that if she were ready to apply them, she’d also Subscriptions
be able to knock him down without them. Later, Subscription Manager
Colleen Cade
when she confronts the head of her old planet, ccade@codemag.com
the Supreme Intelligence, she realizes that she’s
“been fighting with one arm tied behind my back” US subscriptions are US $29.99 for one year. Subscriptions
outside the US are US $49.99. Payments should be made
and yanks out the chip attached to her neck that in US dollars drawn on a US bank. American Express,
they’ve been using to control her. Finally, in a MasterCard, Visa, and Discover credit cards accepted.
scene with her former mentor, Yon-Rogg tosses Bill me option is available only for US subscriptions.
Back issues are available. For subscription information,
away his weapon and eggs her on, encouraging e-mail subscriptions@codemag.com.
her to just “turn off the light show” and fight him
arm to arm. Her response as she blasts him away: Subscribe online at
“I have nothing to prove to you.” www.codemag.com

CODE Developer Magazine


The best managers don’t force their people to 6605 Cypresswood Drive, Ste 425, Spring, Texas 77379
act in ways that minimize their powers. They em- Phone: 832-717-4445
brace the brilliance, help them turn their flaws Fax: 832-717-4460
into good qualities and allow them to remain true
to themselves.

Yes, Sometimes You Have


to Do the Dirty Jobs
After saving C-53, otherwise known as Planet
Earth, it’s only right that Carol and Fury wash
the dishes. Even superheroes need to help out

codemag.com What Captain Marvel Can Teach Us about Management 73


MANAGED CODER

What Captain Marvel Can


Teach Us about Management
Too bad the timing wasn’t better for Captain Marvel and Megan Rapinoe to coordinate. Otherwise, instead
of standing like Tinker Bell as she stared down Ronan the Accuser, with his take-no-prisoners Kree military
force and shipload of warheads, Carol Danvers’ posture would have been a bit more, well, managerial.

After all, who of us alive will ever forget the out- down a prey that can take the form of anybody Carol expresses curiosity about the “communica-
stretched arms and taunting stance the purple- around you. How do you identify the enemy? But tor” and Fury reassures her that he’s only texting
haired Rapinoe displayed every time she scored dur- because she’s our hero, she has an innate ability his mom.
ing the Women’s World Cup? (Alas, the movie ap- to pick out the bad guy, which becomes more ob-
peared months before the American win in France.) vious when he blasts his way through the top of a When they finally do escape, Carol comes under
railcar and they take the fight up on the roof. As attack again from what we believe at first to be a
Pose aside, however, there was plenty more signal- 1990s Los Angeles flashes by in the background, S.H.I.E.L.D. team; Fury has led them right to her.
ing that “Vers” was somebody worth following. Re- Carol gives as good as she gets until they head He quickly realizes, however, that appearances
cently, a small corps of friends gathered in front of into a tunnel. Suddenly, she can’t see anything can be deceptive and rejoins Carol in her attempt
my flatscreen to rewatch the movie (they’d all seen until they stop at the station. to leave the bunker, this time via fighter jet. They
it in the theater), finish off a couple of bottles of barricade themselves on the bunker’s flight deck,
California vino and share what struck them about She leaps out of the car to join the surge of and moments from certain doom, Carol holds her
Captain Marvel’s management qualities, viewed people disembarking and spies the shapeshifter hand out. She wants Fury to give her the com-
from their own perspectives as managers. walking away, still in the image of the man whose municator. Now. As she tells him, “You obviously
form he last took. She grabs him from behind and can’t be trusted with it.”
Warning! Yes, the rest of this column contains a he turns, ready to receive the blow. But one look
mighty collection of what some might consider tells her this is the authentic person, not the “She jumps right on it,” observes my friend Sta-
spoilers, but that I prefer to call “preparatory form stolen by the shapeshifter. Her fist drops ci, firefighter and forest aviation officer. “She
notes.” I enjoyed my second viewing of Captain Mar- and she moves on—with no apology. doesn’t let it fester.” She takes care of what she
vel more than my first, not only because I knew what views as a problem immediately. Good managers
was coming but also because I understood things Later on, after she’s insulted Tom, the guy who don’t avoid conflict.
better. (The wine helped too.) No need to thank me. lives next door to her BFF Maria, the same thing
happens. Likewise, they don’t hold grudges. That takes too
much energy.
It’s Not Always Bad to Be As my friend Donna, a college instructor, pointed
Driven by Your Emotions out, “Look at that! She didn’t apologize. Women
During a sparring match, when mentor Yon-Rogg apologize far too freakin’ much.” Let Your People Lead Up
has our hero Carol pinned to the floor and her When Carol’s friend Maria is invited to join the
fists begin to sizzle orange in frustration, he Sure, a good manager is capable of saying, “I’m mission as a co-pilot to track down Dr. Lawson’s
tells her, “There is nothing more dangerous to a sorry,” but not every time she makes a blunder. ship in a jerry-rigged plane, she begs off. As a
warrior than emotion.” (Oh, how often have we So when does an apology pass Carol’s lips? Only single mom, she reminds Carol that she can’t
heard, “Don’t let your emotions get the best of when she finds out just why Skrull General Talos leave her daughter Monica. “There’s no way I’m
you”?) Yet leaders know that the passion behind has been trying to capture her. And then it’s an going, baby,” she says. “It’s too dangerous.”
the emotion can drive them and their staff to authentic apology, born of self-awareness. When
keep going when things are looking down. On a manager uses “sorry” too much, it loses impact But Monica won’t have any of that. “Testing
top of that, having a grasp of emotional intel- and can be perceived as weakness. brand new aerospace tech is dangerous. Didn’t
ligence—being able to understand your own emo- you use to do that?” she suggests. Besides, she
tions and influence the emotions of those around adds, she’ll stay with her grandparents.
you—will get you much further than sheer tech- Jump on the Problem
nical skill. Understanding how people work and Carol and S.H.I.E.L.D. agent Fury are in a hid- Maria turns to Carol, who’s listening in on the
what motivates them emotionally is critically im- den government mountain bunker hunting down conversation. “Your plan is to leave the atmo-
portant to pulling them in to help you achieve information about the mysterious Dr. Lawson, sphere in a craft not designed for the journey,
your goals and is far more effective than sending whom Carol believes holds the key to stopping and you anticipate hostile encounters with a
out yet another directive-by-Slack. the Skrulls from taking over the universe. When technologically superior foreign enemy. Correct?”
the pair tell officials why they’re there, they’re
locked into a nameless office. After a half-heart- Carol doesn’t say a word; just shrugs. But Monica
Stop Apologizing ed attempt to get free, Fury pulls out his “state- speaks up: “That’s what I’m saying. You have to
Carol has just spent the last five minutes chasing of-the-art two-way pager” and sends a furtive go.” Besides, she adds, “I just think that you
down a continually shape-shifting Skrull through message to his work partner, Agent Coulson:
a moving Metro Rail train. Imagine trying to hunt “Detained with target. Need backup.” (Continued on page 73)

74 What Captain Marvel Can Teach Us about Management codemag.com

S-ar putea să vă placă și