InfoSurf Logo Chemical Literature (Chem 184/284)
 You are here: Home > Library Services > Library Instruction > Chemical Literature (Chem 184/284) > Lecture 14

Lecture 14: SciFinder Scholar, Part 3:
Substance Searching: Names, Molecular Formulas and Displaying Substance Records

Searching by Chemical Substance: Identifiers and Molecular Formulas

SciFinder Scholar opening screen

The second option on the SciFinder Scholar opening screen is "Locate". The Locate options all deal with finding "known items". Selecting this option reveals further choices:

Locate options

The Locate Literature options are used to find specific articles or patents when you have a full or partial citation (e.g. author's name, journal title, pages, or CAS abstract number or patent number.) We shall not look at these in detail. However, "Locate Substances" using chemical names or CAS Registry Numbers is an important and heavily used feature.

Substance Identifier

Substance Identifier input screen

The Substance Identifier input screen allows you to search by chemical name(s), or by CAS Registry Number(s). You can search more than one identifier at the same time, by putting each name or Registry Number on a separate line.

Searching for common analgesics

In the default format (Standard), results are displayed as Registry Number, structure, number of references in the CAPLUS database, and link buttons (see below). In the View menu, you can also select the Compact format, which displays the structure only, or the Summary format, which adds the systematic chemical name to the Standard record. The Full format is awkward for displaying records in a table.

Records are displayed 3 to a line, with the highest (i.e. most recently added) Registry Numbers first. Clicking on the View menu, you can alternatively select Similarity Sort, which groups "similar" molecules together. CAS does not advertise the criteria it uses for similarity. In either case, use the scroll bar at right to see additional results.

Substance records brief display

Linking buttons:
Button for retrieving references - Gets CAPLUS (or MEDLINE) document references for the compound chosen. (See below.)
Button for commercial availability - Gets CHEMCATS commercial availability information for the compound chosed. (See below.)
Button for regulatory info - Gets CHEMLIST regulatory information for the compound chosen. (See below.)
Button for reaction diagrams - Gets CASREEACT reaction diagrams for the compound chosen. (See below.)
Button for displaying 3D structure - Displays a 3D structure model of the molecule. This only applies to organic substances with no stereochemical information.
These buttons are only displayed in records for which the corresponding information is available. The 3D button only appears if the computer being used has Accelrys ViewerPro or ViewerLite software (version 3.5 or higher) installed. This software is available only for Windows computers; the terminals in the UCSB Library are not currentley equipped with it.

To see the full display for any record, click the microscope icon Microscope icon to the left of the desired record.

Main record for naproxen

This record derives from the Chemical Abstracts Service Registry file. Note the information provided:

Other types of substances may display other types of information. Here's the record for the antibiotic vancomycin, which is a short peptide chain.

Main record for vancomycin
Main record for vancomycin. part 2
Main record for vancomycin, part 3

Note that the amino acid sequence is given using standard one-letter codes. A table gives the modificaitons and cross-linking of the sequence. Longer sequences, with more than 255 non-hydrogen atoms, do not have structure diagrams: only sequence strings are given.

Substances with non-stoichiometric formulas, like metal alloys and ceramic superconductors, may display tabular compositions:

Main record for Monel metal

Experimental and Predicted Property Values

Top of record for naproxen experimental properties
2nd part of record for naproxen experimental properties
3rd part of record for naproxen experimental properties
4th part of record for naproxen experimental properties
5th part of record for naproxen experimental properties
6th part of record for naproxen experimental properties

Top of record for naproxen IR spectrum
2nd part of record for naproxen IR spectrum

Top of record for naproxen calculated properties
2nd part of record for naproxen calculated properties
3rd part of record for naproxen calculated properties
4th part of record for naproxen calculated properties
5th part of record for naproxen calculated properties

Other types of substances can have different types of experimental property information appropriate to the substance. Polymers and plastics, such as PTFE (polytetrafluoroethylene, aka Teflon), and metal alloys, have values for properties like breakdown voltage, compressile and tensile strength, dielectric strength, electric conductivity and fracture toughness. Radioactive elements have half-lives, neutron capture cr0ss-sections, etc.

Commercial Source Information

SciFinder Scholar automatically links from Registry file records to the corresponding record in the CHEMCATS chemical catalogs database.
Brief records list each company, when the record was last updated, the name and order number that company uses for the compound, synonyms which it recognizes and the CAS Registry number.
Note that the records list is generally in order by the date that the data was updated, most recent first. Note, too, that the availability information may be exported to a Microsoft Excel file for easier manipulation, e.g. sorting by name, price, etc..

Companies selling the chemical aspirin

Clicking on the microscope icon pulls up the full record for the substance in the particular company's catalog, including pricing, addresses, phone, fax, e-mail, etc. Note that thee is no guarantee that the price is current, at least it gives a ballpark estimate.

Aspirin from Research Organics

Regulated Chemicals Listing

SciFinder Scholar also links from each substance record to the corresponding information from the CHEMLIST database of chemical regulatory information. As you can see below, this covers U.S. federal, other national government and U.S state government agencies. The number of agencies listed grows each year.

Regulatory info on aspriin, part 1
Regulatory info on aspirin, part 2
Regulatory info on aspriin, part 3
Regulatory info on aspirin, part 4

Reaction Display

When you click the "Reactions" button for a substance, you get a screen where you may select the role or roles which the desired compound plays in the reaction.

Reaction roles selection

Clicking OK then retrieves the available reactions from the CASREACT database. Note that this covers organic reactions only, and is not necessarily comprehensive.

Reactions producing naproxen

Note the reactant(s), products(s), catalyst and solvent information. Yields may be provided where available. Clicking on the microscope icon here displays the single-step reactions which make up the overall reaction scheme. There is a link back to the document record where the reaction is reported. If the source information is highlighted, you may click on it to see the document record (and from there jump to full text, or "Get Related", etc.) Reactions will be covered in much greater detail in Lecture 14.

Clicking on any substance shown in the diagram allows you to jump to the reactions, references, substance details (i.e. REGISTRY record), commercial sources or regulatory information for that substance. (See below, where I clicked on the catalyst in the first reaction, then selected "Substance Detail". This allows you to work backward or forward in reaction schemes to create multi-step reactions.

Getting reagent information
Catalyst substance record

Refining Substance Results

Just as sets of references may be "refined", so sets of substances may be narrowed down further. [Note: There is no "Analyze" option for sets of substances retrieved by name, Registry Number or molecular formula.]

Refine substances options

Refining by Property Data

Unless you set them otherwise, certain values of certain properties are used as the default for refining substance answer sets. These are the values of the Lipinski "Rule of Five". Parameters developed by Christoper A. Lipinski and colleagues at Pfizer Central Research, Groton, Connecticut, have been widely adopted by the pharmaceutical industry as a means of identifying compounds that are likely to have good absorption profiles. The Lipinski "Rule of Five" states that compounds are likely to have good absorption and permeation in biological systems and are more likely to be successful drug candidates if they meet the following criteria: Refining by properties 1
Refining by properties 2
Refining by properties 3
Refining by properties 4
Refining by properties 5
Refining by properties 6

"Get References" from a Substance Search

Once a set of substances has been located, you may retrieve references one compound at a time by clicking the "References" button associated with the compound record. Alternatively, you select a set of substances by clicking on the check box next to the desired compound(s), and clicking the "Get References" button at the bottom of the substance list screen. If you wish to retrieve references for all the substances in your starting set, it is unnecessary to check any of them.

Substance references selection screen

SciFinder Scholar allows you to select particular types of references for your set of substances. This feature makes use of selected subject terms associated with the Registry Number, including a special field in the record called Roles. In some cases, they also use keywords associated with a given subject (e.g., crystal structure). The image below is a portion of the indexing for a paper found by looking for "analytical studies" of DDT.

Analytical study of DDT

Notice the list of Roles applied to DDT (and to DDE). A compound may have multiple roles in a given document. Not all of the roles assigned by CAS indexers are available as limiting terms for a "Get References" in SciFinder Scholar (as yet). Notice, too, how the structure diagram for the compound in our search query is displayed and the Registry Number is highlighted in blue.

In addition to the Roles selection, there is also a check box that applies only to substances which are biosequences (e.g. proteins or polynucleotides). If you check this box, SciFinder will retrieve not only references containing the Registry Number(s) you have selected, but also closely related biosequences. For example, if you "Get References" for the Registry Number for human insulin, 11061-68-0, it will retrieve only records containing that Registry Number. However, if you check the box, it will also retrieve records containing the Registry Number for generic insulin, for insulin-like growth factor and others besides.

Explore by Molecular Formula

Molecular Formula query screen

Searching SciFinder Scholar for substances by molecular formula is part of the Explore menu, and requires entering the total number of each element. (Only one formula can be searched at a time.) However, the system does not require the elements to be entered in strict Hill order. So long as each element is followed by the number of occurences of that element, the elements can be in any order, and with or without spaces between them (with certain exceptions; see below.)

Below is a portion of the results for searching on the molecular formula, C14H9Cl5, the molecular formula for the insecticide, DDT.

Results of molecular formula search for DDT

Notice how these results include positional isomers (including ones with indefinite positions), stereoisomers and isotopically labelled substances.

Limitations of Identifier and Molecular Formula Searching in SciFinder Scholar

Unfortunately, not all substances are easy to locate in SciFinder Scholar. This reflects the origins of the software. The original commercial version of SciFinder was marketed heavily to companies engaged primarily in organic synthesis, especially the big pharmaceutical firms. As a result, search features which are effective for organic compounds (especially structure searching) are well developed, while other features which might be of greater use to biochemists, polymer chemists, inorganic and organometallic chemists have not yet been implemented.

For example, searching by chemical name will search only for the exact name as entered. This works well for substances with common or trade names, but can be difficult to impossible when trying to use complex systematic names. Moreover, it is impossible to search for families of names, such as in plastics, dyes and other commercial substances where a whole group of materials may vary only in the latter part of the name. Example: Nylon 6, Nylon 66; Lexan 100, Lexan 110.

Many biological substances come in a wide variety of forms, but searching on them by name will only yield the most generic form. Example: Insulin appears in the Registry File in hundreds of different specific forms, depending on species of origin, etc. But a search on "insulin" will yield only the generic Registry Number. Using the "Additional related references" checkbox will find some additional material, but results are non always consistent from one substance to another.

Even for simple organic compounds, searching by name will give only the basic form of the molecule -- and leave out stereoisomers, isotopically substituted forms and the like.

Molecular formula searching has its problems as well. Remember that salts are handled in an odd fashion in print CA? Well, those idiosyncrasies carry over to SciFinder. If you are searching for sodium sulfate (commonly written as Na2SO4, entering the molecular formula as Na2SO4 or Na2O4S won't work. You'll get only a strange result for a compound with no references. Neither will H2O4S.Na2. Only H2O4S.2Na will work. The necessary "smarts" to reinterpret other forms of the formula has not yet been built in. If you do enter the formula correctly, you will get results such as:

Substance records for sodium sulfate

There are some partial workarounds that will help, at least for simple organic and inorganic substances. If you know a simple name for a substance, search that first. Then get the molecular formula from that record, and search it in the form CAS uses to pick up stereoisomers, isotopically labelled substances, etc. If the name you pick doesn't work, try searching the name in a "Research Topic" search. If you can find a reference that way, look at the indexing and you may be able to find a Registry Number that you can use as the start of an "Identifier" search. Alternatively, analyze the resulting records by Registry Number. With any luck the item you are looking for will be among the top few listed Registry Numbers, and you can verify the correct one by clicking on the link to the Registry Record. In the following example, I searched for human insulin as a Research Topic. Analyzing by Registry Number, I found the number for human insulin as the fourth in the overall list. (Note: you can find human insulin using "human insulin" in a search by Substance Identifier..this is just an example of an approach.)

Finding the Registry Number for human insulin by Analyzing a search

In many cases, you will have to resort to structure searching, the most powerful tool in SciFinder.

Even structure searching will not help for polymers, biopolymers (proteins, nucleic acids, etc.), alloys, nonstoichiometric inorganic substances (e.g. the ceramic superconductors) and many other substances. However, there is hope. The features needed to search for most of these substances exist in the Messenger command language on which SciFinder rests. Eventually, ways to tap into them in SciFinder will be implemented, but for now, some searches require resorting to the STN version of the Registry File. This will be discussed further in a later lecture.

This page created by Chuck Huber (huber@library.ucsb.edu).
Updated: 02/18/08 02:08:23